Install your HPC Cluster with Warewulf

Share
Share

Preface

In High Performance Computing (HPC), computing tasks are usually distributed among many compute threads which are spread across multiples cores, sockets and machines (nodes). These threads are tightly coupled together. Therefore, compute clusters consist of a number of largely identical machines that need to be managed to maintain a well-defined and identical setup across all nodes. Once clusters scale up, there are many scalability factors to overcome. Warewulf is there to address this ‘administrative scaling’.

Warewulf is an operating system-agnostic installation and management system for HPC clusters.
It is quick and easy to learn and use as many settings are pre-configured to sensible defaults. Also, it still provides the flexibility allowing fine tuning the configuration to local needs. It is released under the BSD license, its source code is available at https://github.com/warewulf/warewulf. This is where the development happens as well.

This article gives an overview on how to set up Warewulf on SUSE Linux Enterprise High Performance Computing (SLE HPC) 15 SP5 or later.

Installing Warewulf

Compute clusters consist of at least one management (or head) node which is usually multi-homed: connected both to an external network and a cluster private network, as well as multiple compute nodes which reside solely on the private network. Other private networks dedicated to high speed tasks like RDMA and storage access may exist as well. Warewulf gets installed on one of the management nodes of a cluster to manage and oversee the installation and management of the compute nodes.

To install Warewulf on a cluster which is running SLE HPC 15 SP5 or later, simply run:

zypper install warewulf

This package seamlessly integrates into a SUSE system and should therefore be preferred over packages provided on Github.
During the installation, the actual network configuration is written to /etc/warewulf/warewulf.conf. These settings should be verified, as for multi homed hosts a sensible pre-configuration is not always possible.

Setting up the network configuration

Check /etc/warewulf/warewulf.conf for the following values:

ipaddr: 172.16.16.250
netmask: 255.255.255.0
network: 172.16.16.0

where ipaddr should be the IP address of this management host. Also check the values of netmask and network – these should match this network.
Additionally, you may want to configure the IP address range for dynamic/unknown hosts:

dhcp:
  range start: 172.16.26.21
  range end: 172.16.26.50

If the ISC DHCP server (dhcpd) is used (default on SUSE), make sure the value of DHCPD_INTERFACE in the file /etc/sysconfig/dhcpd has been set to the correct value.

Starting warewulf service

You are now ready to start the warewulfd service itself which delivers the images to the nodes:

systemctl enable --now warewulfd.service

Now, wwctl can be used to configure the the remaining services needed by Warewulf. Run:

wwctl configure --all

which will configure all Warewulf related services.
To conveniently log into compute nodes, you should now log out of and back into the Warewulf host, as this will create an ssh key on the Warewulf host which allows password-less login to the compute nodes. Note however, that this key is not yet pass-phrase protected. If you require protecting your private key by a pass phrase, it is probably a good idea to do so now:

ssh-keygen -p -f $HOME/.ssh/cluster

Adding nodes and profiles to Warewulf

Warewulf uses the concept of profiles which hold the generalized settings of the individual nodes. It comes with a predefined profile default, to which all new node will be assigned, if not set otherwise. You may obtain the values of the default profile with:

wwctl profile list default

Now, a node can be added with the command assigning it an IP address:

wwctl node add node01 -I 172.16.16.101

if the MAC address is known for this node, you can specify this as well:

wwctl node add node01 -I 172.16.16.101 -H cc:aa:ff:ff:ee

For adding several nodes at once you may also use a node range, e.g.

wwctl node add node[01-10] -I 172.16.16.101

This will add the nodes with ip addresses starting at the specified address and incremented by Warewulf.

Importing a container

Warewulf uses a special container[1] as base to build OS images for the compute nodes. This is self contained and independent of the operating system installed on the Warewulf host. For SLE HPC customers, SUSE provides a fully supported SLE HPC node image[2].
To import a SLE HPC SP5 node container set your SCC credentials in environment variables and run:

export WAREWULF_OCI_USERNAME=myemail@example.com
export WAREWULF_OCI_PASSWORD=MY_SCC_PASSCODE
wwctl container import docker://registry.suse.com/suse/hpc/warewulf4-x86_64/sle-hpc-node:15.6 \
 sle15.6-node --setdefault

(Replace myemail@example.com and MY_SCC_PASSWORD with the credentials used for your subscription.)

This will import the specified container for the default profile.
Furthermore, it is also possible to import an image from a local installation into a directory by using the path to this directory (chroot directory) as argument for wwctl import.

Booting nodes

As a final preparation you should rebuild the container image, now, by running:

wwctl container build sle15.5-node

as well as all the configuration overlays with the command:

wwctl overlay build

just in case the build of the image may have failed earlier due to an error. If you didn’t assign a hardware address to a node before, you should set the node into the discoverable state before powering it on. This is done with:

wwctl node set node01 --discoverable

Also you should run:

wwctl configure hostlist

to add the new nodes to the file /etc/hosts. Now, you should make sure that the node(s) will boot over PXE from the network interface connected to the specified network and power on the node(s) to boot into assigned image.

Additional configuration

The configuration files for the nodes are managed as Golang text templates. The resulting files are overlayed over the node images. There are two ways of transport for the overlays to the compute node:

  • the system overlay which is ‘baked’ into the image during boot as part of the wwinit process.
  • the runtime overlay which is updated on the nodes on a regular base (1 minute per default) via the wwclient service.

n the default configuration the overlay called wwinit is used as system overlay. You may list the files in this overlays with the command:

wwctl overlay list wwinit -a

which will show a list of all the files in the overlays. Files ending with the suffix .ww are interpreted as template by Warewulf, the suffix is removed in the rendered overlay. To inspect the content of an overlay use the command:

wwctl overlay show wwinit /etc/issue.ww

To render the template using the values for node01 use:

wwctl overlay show wwinit /etc/issue.ww -r node01

The overlay template itself may be edited using the command:

wwctl overlay edit wwinit /etc/issue.ww

Please note that after editing templates, the overlays aren’t updated automatically and you should trigger a rebuild with the command:

wwctl overlay build

The variables available in a template can be listed with:

wwctl overlay show debug /warewulf/template-variables.md.ww

Modifying the container

The node container is a self contained operating system image. You can open a shell in the image with the command:

wwctl container shell sle15.5-node

After you have opened a shell, you may install additional software using zypper.
The shell command provides the option --bind which allows mounting arbitrary host directories into the container during the shell session.
Please note that if a command exits with a non-zero status, the image won’t be rebuilt automatically. Therefore, it is advised to rebuild the container with:

wwctl container build sle15.5-node

after any change.

Network configuration

Warewulf allows configuring multiple network interfaces for the compute nodes. Therefore, you can add another network interface for example for infiniband using the command:

wwctl node set node01 --netname infininet -I 172.16.17.101 --netdev ib0 --mtu 9000 --type infiniband

This will add the infiniband interface ib0 to the node node01. You can now list the network interfaces of the node:

wwctl node list -n

As changes in the settings are not propagated to all configuration files, the node overlays should be rebuilt after this change by running the command:

wwctl overlay build

After a reboot, these changes will be present on the nodes; in the above case the Infiniband interface will be active on the node.
A more elegant way to get the same result is to create a profile to hold all those values which are identical for all interfaces. In this case, these are mtu and netdev. Create anew profile for an Infiniband network using the command:

wwctl profile add infiniband-nodes --netname infininet --netdev ib0 --mtu 9000 --type infiniband

You may now add this profile to a node and remove the node specific settings which are now part of the common profile by executing:

wwctl node set node01 --netname infininet --netdev UNDEF --mtu UNDEF --type UNDEF \
 --profiles default,infiniband-nodes

To list the data in a profile use the command:

wwctl profile list -A infiniband-nodes

Secure Boot

Switch to grub boot

By default, Warewulf boots nodes via iPXE, which isn’t signed by SUSE and can’t be used when secure boot is enabled. In order to switch to grub as the boot method you must add or change the following value in /etc/warewulf/warewulf.conf:

warewulf:
  grubboot: true

After this change, you will have to reconfigure dhcpd and tftp executing:

wwctl configure dhcp 
wwctl configure tftp

and rebuild the overlays with the command:

wwctl overlay build

Also make sure that the packages shim and grub2-x86_64-efi (for x86-64) or grub2-arm64-efi (for aarch64) are installed in the container. shim is required by secure boot.

Add disk to configuration

Warewulf boots ephemeral systems, thus there is no need for local disk storage. Still, local disk storage may me useful to have, for instance as scratch storage for computational tasks. Warewulf is capable of setting up local disk storage. For this, it is necessary to configure the involved entities:

    • physical storage device(s) to be used
      partition(s) on the disks
      filesystem(s) to be used
  • With SLES 15 SP6, this is now possible with warewulf. Warewulf doesn’t manage above listed entities itself, but creates a configuration and service files for ignition to perform this task. Therefore, you need to make sure to install ignition and gptfdisk on the compute node. Open a shell in the container and run:

    zypper -n in -y zypper install ignition gptfdisk

    The command wwctl node set is used to configure all aspects of the disk configuration:

    Disks

    The path to the physical storage device e.g. /dev/sda is set using the --diskname option. The only valid configuration option for disks is --diskwipe, which should be self-explanatory.

    Partitions

    The --partname $PARTNAME option sets the name to the partition which iginition uses as the path for the device files, i.e. as the partition label /dev/disk/by-partlabel/$PARTNAME.

    Additionally, the size and number of the partition need be specified using the --partsize and --partnum options for all but the last partition (the one with the highest number) in which case this partition will be extended to the maximal size possible.

    You should also set the boolean variable --partcreate so that a parition is created if it doesn’t exist.

    Filesystems

    Filesystems are defined by the partition which contains them, so the name specified using the --fsname $PARTNAME option needs to match the partition name, ie. what is used for the disklabel /dev/disk/by-partlabel/$PARTNAME. A filesystem needs to have a path if it is to be mounted, but its not mandatory.

    Examples

    Add a scratch partition

    wwctl node set node01 \
      --diskname /dev/vda --diskwipe \
      --partname scratch --partcreate \
      --fsname scratch --fsformat btrfs --fspath /scratch --fswipe
    

    This will be the only (and last) partition, therefore it does not require a size. To add another partition as a swap partition, you may run:

    wwctl node set n01 \
      --diskname /dev/vda \
      --partname swap --partsize=1024 --partnumber 1 \
      --fsname swap --fsformat swap --fspath swap
    

    This adds the partition number 1 which will be placed before the scratch partition.

    [1]This container is special only in that it is bootable, i.e. it contains a kernel and an init-implementation (i.e. systemd) (Go Back)

    [2]You will find node containers with ‘community support’ which do not require a subscription at registry.opensuse.org/science/warewulf/leap-15.5/containers/kernel:latest.
    A blog post describing the installation and setup of Warewulf on openSUSE Leap and Tumbleweed can be found here. (Go Back)

     

    Share
    (Visited 9 times, 1 visits today)
    Avatar photo
    4,067 views