Accelerate your network with OvS-DPDK
Introduction
First of all, to set some expectations to you, this blog-post does not focus on what DPDK or OpenVSwitch (OvS) are but it tries to provide the information required to have them running on a SUSE distribution. If you require detailed information about DPDK or OvS design, architecture, internals or API then please refer to either the DPDK website (http://www.dpdk.org) or the OpenVSwitch website (http://www.openvswitch.org) respectively.
DPDK
Pre-requisite
DPDK needs some requirements to be fulfilled from both a hardware and software perspective in order to function correctly:
- IOMMU
- Hugepages support
In order to enable the IOMMU support, it is required to edit the kernel bootstrap parameters (/etc/default/grub) and add the following:
iommu=pt intel_iommu=on
For hugepages support, different system support different hugepage sizes. For most use cases it is recommended to use a 2MB hugepage. To enable that, it is required to edit the kernel bootstrap parameters and add the following:
default_hugepagesz=2M hugepagesz=2M hugepages=1024
Once the above modifications have been applied, the following command needs to be run:
$ grub2-mkconfig -o /boot/grub2/grub.cfg
and finally reboot the machine with
$ reboot
An alternative approach to setup hugepages which works only for 2MB hugepages is the following:
$ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
On NUMA machines the instructions have to be issued for each NUMA node:
$ echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages $ echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
For further information, please visit https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html
Once the machine has rebooted, please ensure that the IOMMU is enabled and working correctly by issuing:
$ dmesg | grep DMAR
The output will show a lot of information and the one confirming the IOMMU setup is
[ 0.000000] DMAR: IOMMU enabled
Similarly, before proceeding any further with installation and setup, let’s check that hugepages are available by:
$ cat /proc/meminfo |grep Huge
Depending on the number of hugepages reserved a different output might be displayed. An example is showed below:
AnonHugePages: 509952 kB HugePages_Total: 1024 HugePages_Free: 1024 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB
Once the hugepages have been enabled they need to be made available to DPDK by mounting that memory. That can be done via:
$ mkdir /mnt/huge $ mount -t hugetlbfs nodev /mnt/huge
In order to have the mount point persist across reboots it is required to edit the /etc/fstab and add:
nodev /mnt/huge hugetlbfs defaults 0 0
Installation & Setup
The latest and greatest openSUSE release (Leap 15) comes with DPDK 17.11.2 (http://www.dpdk.org). On the other hand, Tumbleweed (the SUSE rolling release) is offering DPDK 18.02.2.
On both distributions, to have the DPDK libraries and tools installed, you can simply type:
$ zypper install dpdk dpdk-tools
Once the installation completes, the binary dpdk-devbind (part of dpdk-tools) can be used to query the status of network interfaces and bind / unbind them to DPDK.
Different PMDs (Poll Mode Drivers) may require different kernel drivers in order to work properly. Depending on the PMD being used, a corresponding kernel driver should be loaded and binded to the network ports. Details of the different kernel drivers can be found at https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html
For this post we will use the kernel driver vfio-pci which is fine in most circumstances:
$ modprobe vfio-pci
Now, as mentioned earlier we can use the dpdk-devbind binary to query and modify network interface(s) assignment:
$ dpdk-devbind.py --status
Network devices using DPDK-compatible driver ============================================ 0000:82:00.0 '82599EB 10-GbE NIC' drv=vfio-pci unused=ixgbe 0000:82:00.1 '82599EB 10-GbE NIC' drv=vfio-pci unused=ixgbe Network devices using kernel driver =================================== 0000:04:00.0 'I350 1-GbE NIC' if=em0 drv=igb unused=vfio-pci *Active* 0000:04:00.1 'I350 1-GbE NIC' if=eth1 drv=igb unused=vfio-pci 0000:04:00.2 'I350 1-GbE NIC' if=eth2 drv=igb unused=vfio-pci 0000:04:00.3 'I350 1-GbE NIC' if=eth3 drv=igb unused=vfio-pci Other network devices ===================== <none>
From the above output we can understand that 2 ports are managed by vfio-pci and can be used by DPDK.
At the same time, the tool informs us about which kernel driver would be capable of managing this device when not managed by DPDK. That is accomplished by the “unused” field in the command output.
To bind a port to vfio-pci hence to DPDK:
$ dpdk-devbind.py --bind=vfio-pci 0000:04:00.3
To return a port to the kernel hence unbinding it from DPDK:
$ dpdk-devbind.py --bind=ixgbe 0000:82:00.0
Ports binding is not persisted automatically across reboots so if you expect given ports to always be assigned to DPDK you need to either create some custom scripts to be managed by systemd or use driverctl to accomplish that. To install driverctl:
$ zypper install driverctl
OpenVSwitch-DPDK on openSUSE
The latest and greatest openSUSE release (Leap 15) comes with OpenVSwitch 2.8.2 whilst the SUSE rolling-release (Tumbleweed) ships 2.9.2.
Installation
In order to install OpenVSwitch via zypper the following command can be used:
$ zypper install openvswitch
The openvswitch daemon and services can be started via
$ systemctl start openvswitch
and to keep it enabled across reboots
$ systemctl enable openvswitch
Setup
By default openvswitch does not take advantage of the DPDK acceleration; to enable it ovs-vswitchd needs to be made aware of it. To do that:
$ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
To confirm that DPDK support has been initialized we can issue:
$ ovs-vsctl get Open_vSwitch . dpdk_initialized
If the setup is fully initialized then the output will show “true”.
Now, in order for openvswitch to use a port accelerated via DPDK that port needs to first be binded to DPDK for manageability purposes (see above).
To create a userspace bridge named br0 and add two dpdk ports to it:
$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev $ ovs-vsctl add-port br0 myportnameone -- set Interface myportnameone \ type=dpdk options:dpdk-devargs=0000:06:00.0 $ ovs-vsctl add-port br0 myportnametwo -- set Interface myportnametwo \ type=dpdk options:dpdk-devargs=0000:06:00.1
For other information specific to OpenVSwitch please see http://docs.openvswitch.org/en/latest/intro/install/dpdk/
Related Articles
Apr 15th, 2024
Install your HPC Cluster with Warewulf
Oct 15th, 2024
No comments yet