Enable and disable maintenance mode in a High Availability Cluster

This document (7023135) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11

SUSE Linux Enterprise High Availability Extension 12

SUSE Linux Enterprise High Availability Extension 15

Situation

Need to put the cluster in to maintenance mode so that the cluster will take no action during the manual manipulation of services, resources and configuration.

Resolution

Put cluster in to maintenance mode on SLES11.x, SLES12.x, SLES15:

crm configure property maintenance-mode=true

Bring cluster out of maintenance mode on SLES11.x, SLES12.x, SLES15:

crm configure property maintenance-mode=false

Use 'crm configure show' or 'crm configure show cib-bootstrap-options' to check the status of the maintenance-mode property:

property cib-bootstrap-options: \
             maintenance-mode=<true/false>

The command 'crm status' can also be used to check the state of the cluster. If the cluster is in maintenance mode, the output should show that all of the resources are 'unmanaged':

output example if in maintenance mode:

    # crm status
      Last updated: Wed May  8 17:18:10 2019
      Last change: Wed May  8 17:18:06 2019 by root via cibadmin on hanode1
      Stack: classic openais (with plugin)
      Current DC: hanode2 - partition with quorum
      Version: 1.1.12-f47ea56
      2 Nodes configured, 2 expected votes
      5 Resources configured

      Online: [ hanode1 hanode2 ]

       sbd_stonith    (stonith:external/sbd):    Started hanode1 (unmanaged)      <<<<  Note the 'unmanaged' flag
       Resource Group: oragrp1
           oraip    (ocf::heartbeat:IPaddr):    Started hanode2 (unmanaged)
           oraclevol    (ocf::heartbeat:Filesystem):    Started hanode2 (unmanaged)
           orcldb    (ocf::heartbeat:oracle):    Started hanode2 (unmanaged)
           orcllisten    (ocf::heartbeat:oralsnr):    Started hanode2 (unmanaged)

NOTE: if each and every individual resource has been separately put in to maintenance mode/unmanaged, you could also see the same output. Care must be taken in a situation where individual resources have been put in to maintenance as this is separate from setting the 'cluster' property for maintenance mode (which affects all resources). One setting does not necessarily over-ride the other and both individual resource and cluster resource settings can be set at the same time. Putting individual resources in to maintenance mode is not covered here.

output example if cluster is not in maintenance mode:

   # crm status
   Last updated: Wed May  8 17:18:44 2019
   Last change: Wed May  8 17:18:42 2019 by root via cibadmin on hanode1
   Stack: classic openais (with plugin)
   Current DC: hanode2 - partition with quorum
   Version: 1.1.12-f47ea56
   2 Nodes configured, 2 expected votes
   5 Resources configured


   Online: [ hanode1 hanode2 ]

    sbd_stonith    (stonith:external/sbd):    Started hanode1   <<<<  Note that there is no  'unmanaged' flag
    Resource Group: oragrp1
        oraip    (ocf::heartbeat:IPaddr):    Started hanode2
        oraclevol    (ocf::heartbeat:Filesystem):    Started hanode2
        orcldb    (ocf::heartbeat:oracle):    Started hanode2
        orcllisten    (ocf::heartbeat:oralsnr):    Started hanode2

The HAWK gui interface can also be used to change the cluster maintenance state with most versions of the High Availability Extension on SLES11.x, SLES12.x and SLES15.x clusters.

Firstly, make sure the Hawk interface is running on the node you will connect to:

Use 'rchawk status' to check the status of the Hawk service.

Use 'rchawk start' to start the service if it is not running, then make sure it is running by checking the status again.

To change the maintenance mode using Hawk, login to the hawk interface using a browser with the correct URL (e.g. https://<IP_Address_or_DNS>:7630/ ).

(the default user for Hawk is 'hacluster' and the password is usually the same password as 'root' until changed. It can easily be changed by using 'passwd hacluster' whilst logged in as root)

Once logged in to the Hawk interface, navigate to 'Cluster Configuration', where you will see the option to change the maintenance-mode. In some versions of Hawk, this is a check-box (checked = 'in maintenance-mode'), in others, it is a Yes/No drop-down option box. Don't forget to apply any change made.

Additional Information

When manipulating the maintenance state of a cluster, as with any cluster commands issued, give the command/process time to finish before executing additional commands or taking actions which depend on the completion of a previously issued cluster state change.

Note that if you choose to stop the Pacemaker service for any reason, be aware that Corosync is also stopped when you stop the Pacemaker service. This might lead to unexpected results for some resources. For example, if DLM is in use, it depends on the membership and messaging services provided by Corosync. If Corosync stops, the DLM resource will assume a split-brain scenario and trigger a fencing operation. To avoid this, manually stop resources like DLM or bring them back from maintenance mode before stopping the Pacemaker service.

Note that a sections on 'Using Maintenance Mode' and 'Hawk Overview' are covered in the 'SUSE Linux Enterprise High Availability Extension' documentation available from the SUSE documentation web pages: https://www.suse.com/documentation/

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.