Bonds and VLANs randomly fail to become active while duplicate IP verification is active
This document (000021492) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 15 Service Pack 4
SUSE Linux Enterprise Server 15 Service Pack 3
SUSE Linux Enterprise Micro 5.5
SUSE Linux Enterprise Micro 5.4
SUSE Linux Enterprise Micro 5.3
SUSE Linux Enterprise Micro 5.2
SUSE Linux Enterprise Micro 5.1
Situation
Resolution
Please install the following version of Wicked or later:
SUSE Linux Enterprise Server 12 SP5: wicked-0.6.75-3.43.1
SUSE Linux Enterprise Server 15 SP3: wicked-0.6.75-150300.4.32.1
SUSE Linux Enterprise Micro 5.1: wicked-0.6.75-150300.4.32.1
SUSE Linux Enterprise Micro 5.2 : wicked-0.6.75-150300.4.32.1
SUSE Linux Enterprise Server 15 SP4: wicked-0.6.75-150400.3.27.1
SUSE Linux Enterprise Micro 5.3: wicked-0.6.75-150400.3.27.1
SUSE Linux Enterprise Micro 5.4 wicked-0.6.75-150400.3.27.1
SUSE Linux Enterprise Server 15 SP5: wicked-0.6.75-150500.3.29.1
SUSE Linux Enterprise Micro 5.5: wicked-0.6.75-150500.3.29.1
To identify if a wicked rpm contains the relevant fix, the following can be used:
# rpm -qp --changelog wicked-0.6.75-150500.3.29.1.x86_64.rpm | less - arp: increase arp-send retry value to avoid address configuration failure due to ENOBUF reported by kernel while duplicate address detection with underlying bonding in 802.3ad mode reporting link "up & running" too early (bsc#1218668, gh#openSUSE/wicked#1020, gh#openSUSE/wicked#1022). [+ 0002-increase-arp-retry-attempts-on-sending-bsc1218668.patch]
Cause
Additional Information
Increase the ARP probes that wicked sends before giving up on the NIC configuration:
Add the following to the '/etc/wicked/local.xml' file and restart the wicked.service:
<config> <addrconf> <arp> <verify> <count>2</count> <interval>2000</interval> <retries>10</retries> </verify> </arp> </addrconf> </config>Note that the file local.xml may have to be created if not present.
It is actually enough to only change the retries value, as this controls the ENOBUFS error handling. As an alternative to the above example, the following could be used to set the retries value for dhcp4, auto4 and static-IP :
<config> <addrconf> <auto4> <arp> <verify> <retries>20</retries> </verify> </arp> </auto4> <dhcp4> <arp> <verify> <retries>20</retries> </verify> </arp> </dhcp4> <arp> <verify> <retries>20</retries> </verify> </arp> </addrconf> </config>With this, wicked tries to send 3 verify packets in an interval between 0.67s and 2s (this is the default). During these 3 attempts it can have up to 20 ENOBUFS errors. This mean, it tries at least ~13s. You should see a similar message to the following in debug output:
e.g. wickedd[8721]: en0: ARP verify failed for 192.168.0.22 - ENOBUFS, probes:0/3 errors:8/20
Note: there is an important limitation: the verify duration time can't exceed 15 seconds, that means: interval * count <= 15000
Note: The time LACP needs to complete depends also on the switch setup, e.g. VPC (Virtual Port Channel) or MLAG (Multi-chassis Link Aggregation Group).
Note: Regarding the local.xml workaround. A way to test if the workaround is active is to use tcpdump on another host to check if wickedd is sending the ARP requests according to the verify parameter settings.
Note: Since the root cause of the problem is actually in the bonding driver, not Wicked, the bonding driver problem is being investigated. The intention is to eventually arrive at a fix which is acceptable to upstream.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021492
- Creation Date: 12-Jul-2024
- Modified Date:16-Jul-2024
-
- SUSE Linux Enterprise Server
- SUSE Linux Enterprise Micro
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com