SUSE Support

Here When You Need Us

Corosync: How to test a 2 ring configuration.

This document (000020974) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12

Situation

Two rings were configured in Corosync and need to be tested.

For this example, a 2 node cluster was created as follows:

    node1 with IPs: 192.168.100.58 on eth0 and 192.168.200.75 on eth1

    node2 with IPs: 192.168.100.59 on eth0 and 192.168.200.76 on eth1

The corosync.conf configuration contains:

Inside the "totem" section:

    transport: updu
    interface {
        ringnumber:    0
        mcastport:    5405
        ttl:    1
    }
    interface {
        ringnumber:    1
        mcastport:    5406
        ttl:    1
    }



Inside the "nodelist" section:

    node {
    nodeid:    1
    ring0_addr:    192.168.100.58
    ring1_addr:    192.168.200.75
    }
    node {
    nodeid:    2
    ring0_addr:    192.168.100.59
    ring1_addr:    192.168.200.76
    }



NOTE: The following test is done by using "iptables" to drop all traffic on each ring's port (one at the time), NOT on the IP address. This is very important, since dropping the traffic of the IP address will cause a failure on any cluster resources depending on the network communication via that IP (e.g.: IPaddr2 or NFS resources) triggering the stop and/or move of the affected resources.

 

Resolution

All the following commands must be issued as root user:

1- Based on the configuration described above in the "Situation" section, the ring0 is configured to use port 5405, so all UDP traffic on that port must be dropped. On any node:

# iptables -A INPUT -i eth0 -p udp --destination-port 5405 -j DROP


2- On both cluster nodes, the messages file shows the ring changing to "FAULTY":

node1 corosync[XXXX]:   [TOTEM ] Marking ringid 0 interface 192.168.100.58 FAULTY

node2 corosync[YYYY]:   [TOTEM ] Marking ringid 0 interface 192.168.100.59 FAULTY



3- Also, on both nodes, the status of Corosync shows the "FAULTY" situation of ring0:

# corosync-cfgtool -s

RING ID 0
    id    = 192.168.100.58
    status    = Marking ringid 0 interface 192.168.100.58 FAULTY
RING ID 1
    id    = 192.168.200.58
    status    = ring 1 active with no faults



4- At this point, except for the entries from TOTEM in the messages file about the FAULTY ring0, the cluster should be up and running without incidents.

5- Before proceeding to test ring1, the iptables rule set in step 1 must be cleaned up:

# iptables -F


6- Then a verification that no iptables rules are set:

# iptables -L


7- Once cleaned the iptables rule, ring0 automatically recovers, the messages files show:

node1 corosync[XXXX]:   [TOTEM ] Automatically recovered ring 0

node2 corosync[YYYY]:   [TOTEM ] Automatically recovered ring 0



8- And the status of Corosync shows:

# corosync-cfgtool -s

RING ID 0
    id    = 192.168.100.58
    status    = ring 0 active with no faults
RING ID 1
    id    = 192.168.200.58
    status    = ring 1 active with no faults



9- The ring1 is configured to use port 5406, so all UDP traffic on that port must be dropped. On any node:

# iptables -A INPUT -i eth1 -p udp --destination-port 5406 -j DROP


10- And follow the same procedure from steps 2 through 8.

Additional Information

If a full test is required to see how the cluster reacts (including its resources) while losing all communication via one of the rings, then the test can be done by dropping all traffic on each ring's IP (one at the time), as described in TID#000018699

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000020974
  • Creation Date: 14-Feb-2023
  • Modified Date:14-Feb-2023
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.