How to change the qnetd server's IP on a running Pacemaker cluster.
This document (000021271) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12
Situation
Changing the IP of the qnetd server will require the restart of the system to apply the change on the network and the related qnetd service, and also it will require the restart of the cluster stack on all the cluster nodes, since the change of the IP is not supported by the reload of the Corosync configuration (corosync-cfgtool -R).
IMPORTANT: if the qnetd server is used as tie-breaker on more than one cluster, then the operation must be done on all the clusters at the same time.
Resolution
For this example, the cluster is formed by the nodes "A" and "B", with system "C" as the qnetd server:
NOTE: the following document contains examples of the command's outputs described in steps 1 and 2:
https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-qdevice.html#sec-ha-qdevice-status
1- Open a root session on nodes "A" and "B", and check the current quorum status, run:
# corosync-quorumtool
The output should show:
Nodes: 2 Quorate: Yes Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2If not, then something is wrong, stop here and do not continue.
2- Open a root session on system "C", and check the current quorum status, run:
# corosync-qnetd-tool -lv
The output should show both nodes "A" and "B" with their IPs and without any errors.
3- On system "C", change the IP in the network configuration, and then reboot the system to apply the changes. As soon the "corosync-qnetd.service" is stopped by the reboot, nodes "A" and "B" will start showing warnings similar to:
corosync-qdevice[1234]: Can't connect to qnetd host. (-5986): Network address not available (in use?) corosync-qdevice[1234]: Connect timeoutThis should not affect the cluster operations, as long nodes "A" and "B" are able to see/reach each other over the cluster ring (the network).
4- Once system "C" is back online, check the "corosync-qnetd.service" and make sure that it started without errors.
5- Double-check and make sure that all cluster resources on nodes "A" and "B" are in "started" condition, this is very important, because if "maintenance mode" is enabled with a resource in "stopped" condition, once "maintenance mode" is disabled, the cluster will issue a "probe" and will try to "start" any "stopped" resources, as per resource configuration and constraint rules.
6- Open a root session on one node, "A" or "B", and set the cluster into "maintenance mode":
# crm configure property maintenance-mode=true
Check the cluster status, it should show "Resource management is DISABLED" and also all resources as "unmanaged".
7- Then stop the cluster stack, run on both nodes, "A" and "B":
# crm cluster stop
Double-check and make sure that the pacemaker and corosync services are showing as stopped/inactive.
8- Edit the /etc/corosync/corosync.conf file and change the IP of the qnetd server in the "quorum" section. Once done, copy the modified file to the other node, the file must be the same on all the nodes.
9- Start the cluster stack, run on both nodes, "A" and "B":
# crm cluster start
Wait for the operations to complete, and give it a moment to complete the check of the status of the nodes and the resources. Once finished, the status of the nodes and resources should be the same as before stopping the cluster stack on step 7.
10- Verify the status of the cluster's quorum, check if the change of IP was applied correctly and able to connect with the qned server. Run as root on any of the nodes:
# corosync-quorumtool
The output should be the same as in step 1.
11- On system "C", check the current quorum status, run:
# corosync-qnetd-tool -lv
The output should be the same as in step 2.
12- On one node, "A" or "B", issue a refresh of the resources:
# crm resource refresh
The status of the nodes and resources should be the same as before stopping the cluster stack on step 7.
13- On one node, "A" or "B", issue the command to disable "maintenance mode":
# crm configure property maintenance-mode=false
The cluster will issue a "probe" to check the status of the resources, and it should return to normal operations.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021271
- Creation Date: 10-Nov-2023
- Modified Date:13-Nov-2023
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com