How to test an active-backup bond from the console
This document (7021380) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11
Situation
It is communicating properly.
You want to test the fail over functionality of the bond.
Resolution
If the device that was removed was the active slave when it was removed, then the bond would be forced to react to that change and make another slaved device active.
This document will use bond0 which is enslaving eth0 and eth1 to explain how to remove a device and restore it, so that you can see how bonding responds to those changes.
Step 1 - Check to see which slave is active:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:29:ab:2a:fa
Slave queue ID: 0
Slave Interface: eth0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:29:ab:2a:f0
Slave queue ID: 0
According to the output eth0 is the active slave.
Step 2 -Find the active slaves device files (eth0 in this case)
find /sys -name *eth0
/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/net/eth0
/sys/devices/virtual/net/bond0/lower_eth0
/sys/class/net/eth0
Step 3 - cd to the pci* directory
Example: /sys/devices/pci000:00/000:00:15.0
Step 4
echo 1 > remove
At this point the eth0 device directory structure that was previously located under /sys/devices/pci000:00/000:00:15.0 is no longer there. It was removed and the device no longer exists as seen by the OS.
You can verify this is the case with a simple ifconfig which will no longer list the eth0 device.
You can also repeat the cat /proc/net/bonding/bond0 command from Step 1 to see that eth0 is no longer listed as active or available.
You can also see the change in the messages file. It might look something like this:
2017-09-12T14:13:23.363414-06:00 tdefreese6 wickedd-nanny[766]: device eth0: device has been deleted
2017-09-12T14:13:23.368745-06:00 tdefreese6 kernel: [81594.846099] bonding: bond0: releasing active interface eth0
2017-09-12T14:13:23.368763-06:00 tdefreese6 kernel: [81594.846105] bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:0c:29:ab:2a:f0 - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
2017-09-12T14:13:23.368765-06:00 tdefreese6 kernel: [81594.846132] bonding: bond0: making interface eth1 the new active one.
That concludes the test for fail over on active slave failure.
You can get the deleted device back with a reboot of the server.
You can also get the deleted device back with this command:
echo 1 > /sys/bus/pci/rescan
The eth0 interface should now be back
You can see that it is back with an ifconfig command, and you can verify that the bond sees it with this command:
cat /proc/net/bonding/bond0
That concludes the test of the bond code seeing the device when it comes back again.
The same steps can be repeated only this time using the eth1 device and file structure to fail the active slave in the bond back over to eth0.
Cause
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7021380
- Creation Date: 12-Sep-2017
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com