How to change the qnetd server's IP on a running Pacemaker cluster.

This document (000021271) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12

Situation

The IP of a qnetd server acting as a tie-breaker of a two-node cluster must be changed, with the minimum impact possible to the cluster services and resources.

Changing the IP of the qnetd server will require the restart of the system to apply the change on the network and the related qnetd service, and also it will require the restart of the cluster stack on all the cluster nodes, since the change of the IP is not supported by the reload of the Corosync configuration (corosync-cfgtool -R).

IMPORTANT: if the qnetd server is used as tie-breaker on more than one cluster, then the operation must be done on all the clusters at the same time.

Resolution

In the following procedure, "maintenance mode" will be used to enable stopping the cluster stack on all the cluster nodes without stopping the applications, databases or whatever is configured in the cluster resources. Even when this procedure should not require downtime of the applications or databases configured in the cluster resources, it is highly recommended requesting a maintenance window and perform it after hours.

For this example, the cluster is formed by the nodes "A" and "B", with system "C" as the qnetd server:

NOTE: the following document contains examples of the command's outputs described in steps 1 and 2:
https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-qdevice.html#sec-ha-qdevice-status

1- Open a root session on nodes "A" and "B", and check the current quorum status, run:

# corosync-quorumtool

The output should show:

Nodes:            2
Quorate:          Yes

Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2

If not, then something is wrong, stop here and do not continue.

2- Open a root session on system "C", and check the current quorum status, run:

# corosync-qnetd-tool -lv

The output should show both nodes "A" and "B" with their IPs and without any errors.

3- On system "C", change the IP in the network configuration, and then reboot the system to apply the changes. As soon the "corosync-qnetd.service" is stopped by the reboot, nodes "A" and "B" will start showing warnings similar to:

corosync-qdevice[1234]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
corosync-qdevice[1234]: Connect timeout

This should not affect the cluster operations, as long nodes "A" and "B" are able to see/reach each other over the cluster ring (the network).

4- Once system "C" is back online, check the "corosync-qnetd.service" and make sure that it started without errors.

5- Double-check and make sure that all cluster resources on nodes "A" and "B" are in "started" condition, this is very important, because if "maintenance mode" is enabled with a resource in "stopped" condition, once "maintenance mode" is disabled, the cluster will issue a "probe" and will try to "start" any "stopped" resources, as per resource configuration and constraint rules.

6- Open a root session on one node, "A" or "B", and set the cluster into "maintenance mode":

# crm configure property maintenance-mode=true

Check the cluster status, it should show "Resource management is DISABLED" and also all resources as "unmanaged".

7- Then stop the cluster stack, run on both nodes, "A" and "B":

# crm cluster stop

Double-check and make sure that the pacemaker and corosync services are showing as stopped/inactive.

8- Edit the /etc/corosync/corosync.conf file and change the IP of the qnetd server in the "quorum" section. Once done, copy the modified file to the other node, the file must be the same on all the nodes.

9- Start the cluster stack, run on both nodes, "A" and "B":

# crm cluster start

Wait for the operations to complete, and give it a moment to complete the check of the status of the nodes and the resources. Once finished, the status of the nodes and resources should be the same as before stopping the cluster stack on step 7.

10- Verify the status of the cluster's quorum, check if the change of IP was applied correctly and able to connect with the qned server. Run as root on any of the nodes:

# corosync-quorumtool

The output should be the same as in step 1.

11- On system "C", check the current quorum status, run:

# corosync-qnetd-tool -lv

The output should be the same as in step 2.

12- On one node, "A" or "B", issue a refresh of the resources:

# crm resource refresh

The status of the nodes and resources should be the same as before stopping the cluster stack on step 7.

13- On one node, "A" or "B", issue the command to disable "maintenance mode":

# crm configure property maintenance-mode=false

The cluster will issue a "probe" to check the status of the resources, and it should return to normal operations.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

Document ID:000021271
Creation Date: 10-Nov-2023
Modified Date:13-Nov-2023
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com