Corosync: How to test a 2 ring configuration.
This document (000020974) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12
Situation
For this example, a 2 node cluster was created as follows:
node1 with IPs: 192.168.100.58 on eth0 and 192.168.200.75 on eth1
node2 with IPs: 192.168.100.59 on eth0 and 192.168.200.76 on eth1
The corosync.conf configuration contains:
Inside the "totem" section:
transport: updu
interface {
ringnumber: 0
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
mcastport: 5406
ttl: 1
}
Inside the "nodelist" section:
node {
nodeid: 1
ring0_addr: 192.168.100.58
ring1_addr: 192.168.200.75
}
node {
nodeid: 2
ring0_addr: 192.168.100.59
ring1_addr: 192.168.200.76
}
NOTE: The following test is done by using "iptables" to drop all traffic on each ring's port (one at the time), NOT on the IP address. This is very important, since dropping the traffic of the IP address will cause a failure on any cluster resources depending on the network communication via that IP (e.g.: IPaddr2 or NFS resources) triggering the stop and/or move of the affected resources.
Resolution
1- Based on the configuration described above in the "Situation" section, the ring0 is configured to use port 5405, so all UDP traffic on that port must be dropped. On any node:
# iptables -A INPUT -i eth0 -p udp --destination-port 5405 -j DROP
2- On both cluster nodes, the messages file shows the ring changing to "FAULTY":
node1 corosync[XXXX]: [TOTEM ] Marking ringid 0 interface 192.168.100.58 FAULTY
node2 corosync[YYYY]: [TOTEM ] Marking ringid 0 interface 192.168.100.59 FAULTY
3- Also, on both nodes, the status of Corosync shows the "FAULTY" situation of ring0:
# corosync-cfgtool -s
RING ID 0
id = 192.168.100.58
status = Marking ringid 0 interface 192.168.100.58 FAULTY
RING ID 1
id = 192.168.200.58
status = ring 1 active with no faults
4- At this point, except for the entries from TOTEM in the messages file about the FAULTY ring0, the cluster should be up and running without incidents.
5- Before proceeding to test ring1, the iptables rule set in step 1 must be cleaned up:
# iptables -F
6- Then a verification that no iptables rules are set:
# iptables -L
7- Once cleaned the iptables rule, ring0 automatically recovers, the messages files show:
node1 corosync[XXXX]: [TOTEM ] Automatically recovered ring 0
node2 corosync[YYYY]: [TOTEM ] Automatically recovered ring 0
8- And the status of Corosync shows:
# corosync-cfgtool -s
RING ID 0
id = 192.168.100.58
status = ring 0 active with no faults
RING ID 1
id = 192.168.200.58
status = ring 1 active with no faults
9- The ring1 is configured to use port 5406, so all UDP traffic on that port must be dropped. On any node:
# iptables -A INPUT -i eth1 -p udp --destination-port 5406 -j DROP
10- And follow the same procedure from steps 2 through 8.
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020974
- Creation Date: 14-Feb-2023
- Modified Date:14-Feb-2023
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com