Install your HPC Cluster with Warewulf
Preface
In High Performance Computing (HPC), computing tasks are usually distributed among many compute threads which are spread across multiples cores, sockets and machines (nodes). These threads are tightly coupled together. Therefore, compute clusters consist of a number of largely identical machines that need to be managed to maintain a well-defined and identical setup across all nodes. Once clusters scale up, there are many scalability factors to overcome. Warewulf is there to address this ‘administrative scaling’.
Warewulf is an operating system-agnostic installation and management system for HPC clusters.
It is quick and easy to learn and use as many settings are pre-configured to sensible defaults. Also, it still provides the flexibility allowing fine tuning the configuration to local needs. It is released under the BSD license, its source code is available at https://github.com/warewulf/warewulf. This is where the development happens as well.
This article gives an overview on how to set up Warewulf on SUSE Linux Enterprise High Performance Computing (SLE HPC) 15 SP5 or later.
Installing Warewulf
Compute clusters consist of at least one management (or head) node which is usually multi-homed: connected both to an external network and a cluster private network, as well as multiple compute nodes which reside solely on the private network. Other private networks dedicated to high speed tasks like RDMA and storage access may exist as well. Warewulf gets installed on one of the management nodes of a cluster to manage and oversee the installation and management of the compute nodes.
To install Warewulf on a cluster which is running SLE HPC 15 SP5 or later, simply run:
zypper install warewulf
This package seamlessly integrates into a SUSE system and should therefore be preferred over packages provided on Github.
During the installation, the actual network configuration is written to /etc/warewulf/warewulf.conf
. These settings should be verified, as for multi homed hosts a sensible pre-configuration is not always possible.
Setting up the network configuration
Check /etc/warewulf/warewulf.conf
for the following values:
ipaddr: 172.16.16.250
netmask: 255.255.255.0
network: 172.16.16.0
where ipaddr
should be the IP address of this management host. Also check the values of netmask
and network
– these should match this network.
Additionally, you may want to configure the IP address range for dynamic/unknown hosts:
dhcp:
range start: 172.16.26.21
range end: 172.16.26.50
If the ISC DHCP server (dhcpd
) is used (default on SUSE), make sure the value of DHCPD_INTERFACE
in the file /etc/sysconfig/dhcpd
has been set to the correct value.
Starting warewulf service
You are now ready to start the warewulfd
service itself which delivers the images to the nodes:
systemctl enable --now warewulfd.service
Now, wwctl
can be used to configure the the remaining services needed by Warewulf. Run:
wwctl configure --all
which will configure all Warewulf related services.
To conveniently log into compute nodes, you should now log out of and back into the Warewulf host, as this will create an ssh key on the Warewulf host which allows password-less login to the compute nodes. Note however, that this key is not yet pass-phrase protected. If you require protecting your private key by a pass phrase, it is probably a good idea to do so now:
ssh-keygen -p -f $HOME/.ssh/cluster
Adding nodes and profiles to Warewulf
Warewulf uses the concept of profiles which hold the generalized settings of the individual nodes. It comes with a predefined profile default
, to which all new node will be assigned, if not set otherwise. You may obtain the values of the default profile with:
wwctl profile list default
Now, a node can be added with the command assigning it an IP address:
wwctl node add node01 -I 172.16.16.101
if the MAC address is known for this node, you can specify this as well:
wwctl node add node01 -I 172.16.16.101 -H cc:aa:ff:ff:ee
For adding several nodes at once you may also use a node range, e.g.
wwctl node add node[01-10] -I 172.16.16.101
This will add the nodes with ip addresses starting at the specified address and incremented by Warewulf.
Importing a container
Warewulf uses a special container[1] as base to build OS images for the compute nodes. This is self contained and independent of the operating system installed on the Warewulf host. For SLE HPC customers, SUSE provides a fully supported SLE HPC node image[2].
To import a SLE HPC SP5 node container set your SCC credentials in environment variables and run:
export WAREWULF_OCI_USERNAME=myemail@example.com
export WAREWULF_OCI_PASSWORD=MY_SCC_PASSCODE
wwctl container import docker://registry.suse.com/suse/hpc/warewulf4-x86_64/sle-hpc-node:15.6 \
sle15.6-node --setdefault
(Replace myemail@example.com
and MY_SCC_PASSWORD
with the credentials used for your subscription.)
This will import the specified container for the default profile.
Furthermore, it is also possible to import an image from a local installation into a directory by using the path to this directory (chroot
directory) as argument for wwctl
import.
Booting nodes
As a final preparation you should rebuild the container image, now, by running:
wwctl container build sle15.5-node
as well as all the configuration overlays with the command:
wwctl overlay build
just in case the build of the image may have failed earlier due to an error. If you didn’t assign a hardware address to a node before, you should set the node into the discoverable state before powering it on. This is done with:
wwctl node set node01 --discoverable
Also you should run:
wwctl configure hostlist
to add the new nodes to the file /etc/hosts
. Now, you should make sure that the node(s) will boot over PXE from the network interface connected to the specified network and power on the node(s) to boot into assigned image.
Additional configuration
The configuration files for the nodes are managed as Golang text templates. The resulting files are overlayed over the node images. There are two ways of transport for the overlays to the compute node:
- the system overlay which is ‘baked’ into the image during boot as part of the
wwinit
process. - the runtime overlay which is updated on the nodes on a regular base (1 minute per default) via the
wwclient
service.
n the default configuration the overlay called wwinit
is used as system overlay. You may list the files in this overlays with the command:
wwctl overlay list wwinit -a
which will show a list of all the files in the overlays. Files ending with the suffix .ww are interpreted as template by Warewulf, the suffix is removed in the rendered overlay. To inspect the content of an overlay use the command:
wwctl overlay show wwinit /etc/issue.ww
To render the template using the values for node01 use:
wwctl overlay show wwinit /etc/issue.ww -r node01
The overlay template itself may be edited using the command:
wwctl overlay edit wwinit /etc/issue.ww
Please note that after editing templates, the overlays aren’t updated automatically and you should trigger a rebuild with the command:
wwctl overlay build
The variables available in a template can be listed with:
wwctl overlay show debug /warewulf/template-variables.md.ww
Modifying the container
The node container is a self contained operating system image. You can open a shell in the image with the command:
wwctl container shell sle15.5-node
After you have opened a shell, you may install additional software using zypper
.
The shell command provides the option --bind
which allows mounting arbitrary host directories into the container during the shell session.
Please note that if a command exits with a non-zero status, the image won’t be rebuilt automatically. Therefore, it is advised to rebuild the container with:
wwctl container build sle15.5-node
after any change.
Network configuration
Warewulf allows configuring multiple network interfaces for the compute nodes. Therefore, you can add another network interface for example for infiniband using the command:
wwctl node set node01 --netname infininet -I 172.16.17.101 --netdev ib0 --mtu 9000 --type infiniband
This will add the infiniband interface ib0
to the node node01
. You can now list the network interfaces of the node:
wwctl node list -n
As changes in the settings are not propagated to all configuration files, the node overlays should be rebuilt after this change by running the command:
wwctl overlay build
After a reboot, these changes will be present on the nodes; in the above case the Infiniband interface will be active on the node.
A more elegant way to get the same result is to create a profile to hold all those values which are identical for all interfaces. In this case, these are mtu
and netdev
. Create anew profile for an Infiniband network using the command:
wwctl profile add infiniband-nodes --netname infininet --netdev ib0 --mtu 9000 --type infiniband
You may now add this profile to a node and remove the node specific settings which are now part of the common profile by executing:
wwctl node set node01 --netname infininet --netdev UNDEF --mtu UNDEF --type UNDEF \
--profiles default,infiniband-nodes
To list the data in a profile use the command:
wwctl profile list -A infiniband-nodes
Secure Boot
Switch to grub boot
By default, Warewulf boots nodes via iPXE, which isn’t signed by SUSE and can’t be used when secure boot is enabled. In order to switch to grub as the boot method you must add or change the following value in /etc/warewulf/warewulf.conf
:
warewulf:
grubboot: true
After this change, you will have to reconfigure dhcpd
and tftp
executing:
wwctl configure dhcp
wwctl configure tftp
and rebuild the overlays with the command:
wwctl overlay build
Also make sure that the packages shim
and grub2-x86_64-efi
(for x86-64) or grub2-arm64-efi
(for aarch64) are installed in the container. shim
is required by secure boot.
Add disk to configuration
Warewulf boots ephemeral systems, thus there is no need for local disk storage. Still, local disk storage may me useful to have, for instance as scratch storage for computational tasks. Warewulf is capable of setting up local disk storage. For this, it is necessary to configure the involved entities:
- physical storage device(s) to be used
- partition(s) on the disks
- filesystem(s) to be used
With SLES 15 SP6, this is now possible with warewulf. Warewulf doesn’t manage above listed entities itself, but creates a configuration and service files for ignition to perform this task. Therefore, you need to make sure to install ignition and gptfdisk on the compute node. Open a shell in the container and run:
zypper -n in -y zypper install ignition gptfdisk
The command wwctl node set
is used to configure all aspects of the disk configuration:
Disks
The path to the physical storage device e.g. /dev/sda
is set using the --diskname
option. The only valid configuration option for disks is --diskwipe
, which should be self-explanatory.
Partitions
The --partname $PARTNAME
option sets the name to the partition which iginition uses as the path for the device files, i.e. as the partition label /dev/disk/by-partlabel/$PARTNAME
.
Additionally, the size and number of the partition need be specified using the --partsize
and --partnum
options for all but the last partition (the one with the highest number) in which case this partition will be extended to the maximal size possible.
You should also set the boolean variable --partcreate
so that a parition is created if it doesn’t exist.
Filesystems
Filesystems are defined by the partition which contains them, so the name specified using the --fsname $PARTNAME
option needs to match the partition name, ie. what is used for the disklabel /dev/disk/by-partlabel/$PARTNAME
. A filesystem needs to have a path if it is to be mounted, but its not mandatory.
Examples
Add a scratch partition
wwctl node set node01 \
--diskname /dev/vda --diskwipe \
--partname scratch --partcreate \
--fsname scratch --fsformat btrfs --fspath /scratch --fswipe
This will be the only (and last) partition, therefore it does not require a size. To add another partition as a swap partition, you may run:
wwctl node set n01 \
--diskname /dev/vda \
--partname swap --partsize=1024 --partnumber 1 \
--fsname swap --fsformat swap --fspath swap
This adds the partition number 1 which will be placed before the scratch
partition.
[1]This container is special only in that it is bootable, i.e. it contains a kernel and an init-implementation (i.e. systemd
) (Go Back)
[2]You will find node containers with ‘community support’ which do not require a subscription at registry.opensuse.org/science/warewulf/leap-15.5/containers/kernel:latest
.
A blog post describing the installation and setup of Warewulf on openSUSE Leap and Tumbleweed can be found here. (Go Back)
Related Articles
Sep 26th, 2024