SLES for SAP maintenance procedure for Scale-Out Perf-Opt HANA cluster

March 28, 2024 | By: Sanjeet Kumar Jha

This blog post cover specific maintenance scenarios for scale-out performance optimized HANA cluster. It illustrates the procedures that are defined in man page SAPHanaSR_maintenance_examples which are applicable for Scale-out topology.

We are going to cover the following three scenarios:

HANA takeover procedures
HANA maintenance (or linux OS maintenance when a reboot of the node is not required)
Linux maintenance with reboot

HANA takeover procedures

Check status of Linux cluster and HANA, show current site names.
Set SAPHanaController multi-state resource (and the ip and loadbalancer resources) into maintenance.
Perform the takeover, make sure to use the suspend primary feature.
Check if the new primary is working.
Stop suspended old primary.
Register old primary as new secondary, make sure to use the correct site name.
Start the new secondary.
Check new secondary and its system replication.
Refresh SAPHanaController multi-state resource.
Set SAPHanaController multi-state resource (and the ip and ld balancer resources) to managed.
Finally check status of Linux cluster and HANA.

1. Check status of Linux cluster and HANA, show current site names.

         
suse11:~ # cs_clusterstate -i
### suse11 - 2024-01-11 08:37:23 ###
Cluster state: S_IDLE
suse11:~ #

Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Thu Jan 11 08:36:59 2024
  * Last change:  Thu Jan 11 08:36:54 2024 by root via crm_attribute on suse11
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse11
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse11
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable):
    * Masters: [ suse11 ]
    * Slaves: [ suse12 suse21 suse22 ]
    
    
suse11:~ # SAPHanaSR-showAttr 
Global cib-time                 maintenance prim sec sync_state upd 
--------------------------------------------------------------------
TST    Thu Jan 11 08:37:58 2024 false       ONE  TWO SOK        ok  

Resource                 maintenance 
-------------------------------------
msl_SAPHanaCon_TST_HDB00 false 
g_ip_TST_HDB00           false      

Sites lpt        lss mns    srHook srr 
---------------------------------------
ONE   1704962278 4   suse11 PRIM   P   
TWO   30         4   suse21 SOK    S   

Hosts  clone_state gra gsh node_state roles                        score  site 
-------------------------------------------------------------------------------
suse11 PROMOTED    2.0 2.2 online     master1:master:worker:master 150    ONE  
suse12 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -10000 ONE  
suse21 DEMOTED     2.0 2.2 online     master1:master:worker:master 100    TWO  
suse22 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -12200 TWO  
susemm                     online                                              

suse11:~ #

DISCUSSIONS: Checking whether running system is in a state to run the maintenance procedure is very important. Sometimes cluster is doing some kind of background tasks and it is always good to wait for the cluster to be stable to execute any step of the maintenance procedure.

2. Set SAPHanaController multi-state resource into maintenance.


suse11:~ # crm resource maintenance msl_SAPHanaCon_TST_HDB00 
suse11:~ # crm resource maintenance g_ip_TST_HDB00 
suse11:~ #

Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Thu Jan 11 08:39:39 2024
  * Last change:  Thu Jan 11 08:39:37 2024 by root via cibadmin on suse11
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00: (unmanaged)
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse11 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse11 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)
    
    
suse11:~ # SAPHanaSR-showAttr 
Global cib-time                 maintenance prim sec sync_state upd 
--------------------------------------------------------------------
TST    Thu Jan 11 08:39:37 2024 false       ONE  TWO SOK        ok  

Resource                 maintenance 
-------------------------------------
msl_SAPHanaCon_TST_HDB00 true 
g_ip_TST_HDB00           true       

Sites lpt        lss mns    srHook srr 
---------------------------------------
ONE   1704962342 4   suse11 PRIM   P   
TWO   30         4   suse21 SOK    S   

Hosts  clone_state gra gsh node_state roles                        score  site 
-------------------------------------------------------------------------------
suse11 PROMOTED    2.0 2.2 online     master1:master:worker:master 150    ONE  
suse12 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -10000 ONE  
suse21 DEMOTED     2.0 2.2 online     master1:master:worker:master 100    TWO  
suse22 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -12200 TWO  
susemm                     online                                              

suse11:~ #

DISCUSSIONS: Putting the multi-state resource into maintenance first is the best practice method to start the maintenance on a HANA cluster. We no longer need to put the whole cluster into maintenance mode. Putting maintenance on the virtual IP resource is also important as we want cluster to avoid migrating this resource and we want it to stay running on its existing node. During the period of maintenance we want to manage both these resources manually.

3. Perform the takeover, make sure to use the suspend primary feature:

          
suse11:~ # cs_clusterstate -i
### suse11 - 2024-01-11 08:41:45 ###
Cluster state: S_IDLE
suse11:~ #

tstadm@suse21:/usr/sap/TST/HDB00> hdbnsutil -sr_takeover --suspendPrimary
done.
tstadm@suse21:/usr/sap/TST/HDB00>

DISCUSSIONS: The takeover process will change the role of the secondary site to primary site. The suspendPrimary flag will ensure that the primary database is not used by the application during this process.

4. Check if the new primary is working.


tstadm@suse21:/usr/sap/TST/HDB00> hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: primary
operation mode: primary
site id: 2
site name: TWO

is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false

Host Mappings:
~~~~~~~~~~~~~~

suse22 -> [TWO] suse22

suse21 -> [TWO] suse21


Site Mappings:
~~~~~~~~~~~~~~
TWO (primary/primary)

Tier of TWO: 1

Replication mode of TWO: primary

Operation mode of TWO: primary

done.
tstadm@suse21:/usr/sap/TST/HDB00>

5. Stop suspended old primary.


tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function StopSystem

11.01.2024 08:45:03
StopSystem
OK
tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function WaitforStopped 300 20

11.01.2024 08:49:43
WaitforStopped
OK
tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function GetSystemInstanceList

11.01.2024 08:50:29
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
suse11, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY
suse12, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GRAY
tstadm@suse11:/usr/sap/TST/HDB00>

6. Register old primary as new secondary, make sure to use the correct site name.


tstadm@suse11:/usr/sap/TST/HDB00> hdbnsutil -sr_register --name=ONE --remoteHost=suse21 --remoteInstance=00 --replicationMode=sync --operationMode=logreplay
adding site ...
nameserver suse11:30001 not responding.
collecting information ...
updating local ini files ...
done.
tstadm@suse11:/usr/sap/TST/HDB00>

DISCUSSIONS: This will ensure that the old primary will become the new secondary. Most common mistakes done by administrators are to use a new sitename for registering the old primary and hence it is important to check that one uses the existing sitename.

TODO: Also include the following steps from section “* Check the two site names that are known to the Linux cluster.” from manpage SAPHanaSR_maintenance_examples(7)


           # crm configure show suse11 suse21
           # crm configure show SAPHanaSR | grep hana_ha1_site_mns
           # ssh suse21
           # su - ha1adm -c "hdbnsutil -sr_state; echo rc: $?"
           # exit

7. Start the new secondary.


tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function StartSystem
11.01.2024 08:52:40
StartSystem
OK
tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function WaitforStarted 300 20

11.01.2024 08:54:07
WaitforStarted
OK
tstadm@suse11:/usr/sap/TST/HDB00> sapcontrol -nr 00 -function GetSystemInstanceList

11.01.2024 08:54:29
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
suse11, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN
suse12, 0, 50013, 50014, 0.3, HDB|HDB_WORKER, GREEN
tstadm@suse11:/usr/sap/TST/HDB00>

8. Check new secondary and its system replication.


tstadm@suse11:/usr/sap/TST/HDB00> hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: sync
operation mode: logreplay
site id: 1
site name: ONE

is source system: false
is secondary/consumer system: true
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false
is timetravel enabled: false
replay mode: auto
active primary site: 2

primary masters: suse21

Host Mappings:
~~~~~~~~~~~~~~

suse12 -> [TWO] suse22
suse12 -> [ONE] suse12

suse11 -> [TWO] suse21
suse11 -> [ONE] suse11


Site Mappings:
~~~~~~~~~~~~~~
TWO (primary/primary)
    |---ONE (sync/logreplay)

Tier of TWO: 1
Tier of ONE: 2

Replication mode of TWO: primary
Replication mode of ONE: sync

Operation mode of TWO: primary
Operation mode of ONE: logreplay

Mapping: TWO -> ONE
done.
tstadm@suse11:/usr/sap/TST/HDB00>


tstadm@suse21:/usr/sap/TST/HDB00/exe/python_support> python systemReplicationStatus.py 
| Database | Host   | Port  | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary     | Replication | Replication | Replication    | 
|          |        |       |              |           |         |           | Host      | Port      | Site ID   | Site Name | Active Status | Mode        | Status      | Status Details | 
| -------- | ------ | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- | 
| TST      | suse22 | 30003 | indexserver  |         4 |       2 | TWO       | suse12    |     30003 |         1 | ONE       | YES           | SYNC        | ACTIVE      |                | 
| SYSTEMDB | suse21 | 30001 | nameserver   |         1 |       2 | TWO       | suse11    |     30001 |         1 | ONE       | YES           | SYNC        | ACTIVE      |                | 
| TST      | suse21 | 30007 | xsengine     |         3 |       2 | TWO       | suse11    |     30007 |         1 | ONE       | YES           | SYNC        | ACTIVE      |                | 
| TST      | suse21 | 30003 | indexserver  |         2 |       2 | TWO       | suse11    |     30003 |         1 | ONE       | YES           | SYNC        | ACTIVE      |                |

status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: TWO
tstadm@suse21:/usr/sap/TST/HDB00/exe/python_support>

9. Refresh SAPHanaController multi-state resource.


suse11:~ # crm resource refresh msl_SAPHanaCon_TST_HDB00 
Cleaned up rsc_SAPHanaCon_TST_HDB00:0 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:1 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:2 on susemm
Cleaned up rsc_SAPHanaCon_TST_HDB00:2 on suse11
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on susemm
... got reply
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse11
Waiting for 9 replies from the controller
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply (done)
suse11:~ #

suse11:~ # SAPHanaSR-showAttr 
Global cib-time                 maintenance prim sec sync_state upd 
--------------------------------------------------------------------
TST    Thu Jan 11 08:56:55 2024 false       ONE  TWO SOK        ok  

Resource                 maintenance 
-------------------------------------
msl_SAPHanaCon_TST_HDB00 true 
g_ip_TST_HDB00           true

Sites lpt lss mns    srHook srr 
--------------------------------
ONE   30  4   suse11 SOK    S   
TWO   30  4   suse21 PRIM   P   

Hosts  clone_state gra gsh node_state roles                        score  site 
-------------------------------------------------------------------------------
suse11 DEMOTED     2.0 2.2 online     master1:master:worker:master 100    ONE  
suse12 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -12200 ONE  
suse21 DEMOTED     2.0 2.2 online     master1:master:worker:master 150    TWO  
suse22 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -10000 TWO  
susemm                     online                                              

suse11:~ #

DISCUSSIONS: Refreshing the resources ensures that the resource agents receives the new state/values of the attributes.

10. Set SAPHanaController multi-state resource to managed.


suse11:~ # crm resource maintenance g_ip_TST_HDB00 off
suse11:~ # crm resource maintenance msl_SAPHanaCon_TST_HDB00 off
suse11:~ # SAPHanaSR-showAttr 
Global cib-time                 maintenance prim sec sync_state upd 
--------------------------------------------------------------------
TST    Thu Jan 11 08:58:27 2024 false       ONE  TWO SOK        ok  

Resource                 maintenance 
-------------------------------------
msl_SAPHanaCon_TST_HDB00 false 
g_ip_TST_HDB00           false      

Sites lpt lss mns    srHook srr 
--------------------------------
ONE   30  4   suse11 SOK    S   
TWO   30  4   suse21 PRIM   P   

Hosts  clone_state gra gsh node_state roles                        score  site 
-------------------------------------------------------------------------------
suse11 DEMOTED     2.0 2.2 online     master1:master:worker:master 100    ONE  
suse12 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -12200 ONE  
suse21 PROMOTED    2.0 2.2 online     master1:master:worker:master 150    TWO  
suse22 DEMOTED     2.0 2.2 online     slave:slave:worker:slave     -10000 TWO  
susemm                     online                                              

suse11:~ #


Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Thu Jan 11 08:58:38 2024
  * Last change:  Thu Jan 11 08:58:36 2024 by root via crm_attribute on suse21
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable):
    * Masters: [ suse21 ]
    * Slaves: [ suse11 suse12 suse22 ]

11. Finally check status of Linux cluster.


suse11:~ # cs_clusterstate -i
### suse11 - 2024-01-11 08:59:45 ###
Cluster state: S_IDLE
suse11:~ #

HANA maintenance (or linux OS maintenance when a reboot of the node is not required)

Check if everything looks fine.
Set the SAPHanaController multi-state resource into maintenance mode.
Perform the HANA maintenance, e.g. update to latest SPS.
Tell the cluster to forget about HANA status and to reprobe the resources.
Set the SAPHanaController multi-state resource back to managed.
Remove the meta attribute from CIB, optional.
Check if everything looks fine

1. Check if everything looks fine.


suse21:/home/azureuser # cs_clusterstate -i
### suse21 - 2024-01-21 11:19:19 ###
Cluster state: S_IDLE
suse21:/home/azureuser #

TODO: Also include the following steps from section “* Check status of Linux cluster and HANA system replication pair.” from manpage SAPHanaSR_maintenance_examples(7)


           # cs_clusterstate
           # crm_mon -1r
           # crm configure show | grep cli-
           # SAPHanaSR-showAttr
           # cs_clusterstate -i

2. Set the SAPHanaController multi-state resource into maintenance mode.


suse21:/home/azureuser # crm resource maintenance msl_SAPHanaCon_TST_HDB00 
suse21:/home/azureuser #

Cluster Summary:
  * Stack: corosync
  * Current DC: suse21 (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Sun Jan 21 11:20:16 2024
  * Last change:  Sun Jan 21 11:20:14 2024 by root via cibadmin on suse21
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)

3. Perform the HANA maintenance, e.g. update to latest SPS

4. Tell the cluster to forget about HANA status and to reprobe the resources.


suse22:~ # crm resource refresh msl_SAPHanaCon_TST_HDB00 
Cleaned up rsc_SAPHanaCon_TST_HDB00:0 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:1 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:2 on suse11
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on susemm
Waiting for 5 replies from the controller
... got reply
... got reply
... got reply
... got reply
... got reply (done)
suse22:~ #


Cluster Summary:
  * Stack: corosync
  * Current DC: suse21 (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Sun Jan 21 11:22:20 2024
  * Last change:  Sun Jan 21 11:22:14 2024 by hacluster via crmd on suse22
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)

5. Set the SAPHanaController multi-state resource back to managed.


suse22:~ # crm resource maintenance msl_SAPHanaCon_TST_HDB00 off
suse22:~ #

Cluster Summary:
  * Stack: corosync
  * Current DC: suse21 (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Sun Jan 21 11:24:30 2024
  * Last change:  Sun Jan 21 11:24:28 2024 by root via crm_attribute on suse21
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable):
    * Masters: [ suse21 ]
    * Slaves: [ suse11 suse12 suse22 ]

6. Check if everything looks fine.


suse22:~ # cs_clusterstate -i
### suse22 - 2024-01-21 11:26:33 ###
Cluster state: S_IDLE
suse22:~ #

TODO: Also include the following steps from section “* Check status of Linux cluster and HANA system replication pair.” from manpage SAPHanaSR_maintenance_examples(7)


           # cs_clusterstate
           # crm_mon -1r
           # crm configure show | grep cli-
           # SAPHanaSR-showAttr
           # cs_clusterstate -i

Linux maintenance with reboot

Check the cluster and put the multi-state resource and the ip group resource into maintenance
Set the maintenance on the whole cluster
Stop the cluster on the secondary site nodes where the maintenance is supposed to take place
Manually stop HANA on the node where maintenance is supposed to be done
Disable the pacemaker on the node where the reboot is required after the maintenance
Perform the maintenance and reboot if required
Enable the pacemaker after the reboot
Start the HANA manually
Start the cluster on the secondary site nodes
Refresh the cln_ and msl_ resources
Remove the global maintenance from cluster
Remove maintenance from the multi-state resource and ip group resource
Check the Cluster Status
Perform the takeover as described here and after that rerun steps 1 to 13 on the new secondary site nodes
Perform the maintenance on the majority maker node

1. Check the cluster and put the multi-state resource and the ip group resource into maintenance


suse22:~ # cs_clusterstate -i
### suse22 - 2024-01-29 15:26:09 ###
Cluster state: S_IDLE
suse22:~ # crm resource maintenance msl_SAPHanaCon_TST_HDB00 
suse22:~ # crm resource maintenance g_ip_TST_HDB00 
suse22:~ # 



Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:26:33 2024
  * Last change:  Mon Jan 29 15:26:31 2024 by root via cibadmin on suse22
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

2. Set the maintenance on the whole cluster

    
suse22:~ # crm maintenance on
suse22:~ #




Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:27:09 2024
  * Last change:  Mon Jan 29 15:27:06 2024 by root via cibadmin on suse22
  * 5 nodes configured
  * 13 resource instances configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm (unmanaged)
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00] (unmanaged):
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse11 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse12 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse21 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse22 (unmanaged)
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

3. Stop the cluster on the secondary site nodes where the maintenance is supposed to take place

    
suse22:~ # crm cluster stop suse11 suse12
INFO: The cluster stack stopped on suse11
INFO: The cluster stack stopped on suse12
suse22:~ #



Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:28:20 2024
  * Last change:  Mon Jan 29 15:27:06 2024 by root via cibadmin on suse22
  * 5 nodes configured
  * 13 resource instances configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ suse21 suse22 susemm ]
  * OFFLINE: [ suse11 suse12 ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm (unmanaged)
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00] (unmanaged):
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse11 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse12 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse21 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse22 (unmanaged)
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse11 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse12 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

DISCUSSIONS: There are two reason why we need to stop the cluster on the nodes where we will perform the maintenance.
First because of all this maintenance procedure targets to patch update OS as well as cluster software stack which can better be done when cluster is stopped.
Second reason is that if anything goes wrong during the maintenance then we can at least rule out cluster as the source of the problem when the cluster is stopped.

4. Manually stop HANA on the node where maintenance is supposed to be done

    
suse11:~ # su - tstadm 
tstadm@suse11:/usr/sap/TST/HDB00> HDB info
USER          PID     PPID  %CPU        VSZ        RSS COMMAND
tstadm      28839    28837   1.2      14404       7348 -sh
tstadm      29030    28839   0.0       8284       3960  \_ /bin/sh /usr/sap/TST/HDB00/HDB info
tstadm      29061    29030   0.0      17848       3984      \_ ps fx -U tstadm -o user:8,pid:8,ppid:8,pcpu:5,vsz:10,rss:10,args
tstadm       6037        1   0.0     686404      51136 hdbrsutil  --start --port 30003 --volume 2 --volumesuffix mnt00001/hdb00002.00003 --identi
tstadm       5103        1   0.0     686076      50864 hdbrsutil  --start --port 30001 --volume 1 --volumesuffix mnt00001/hdb00001 --identifier 1
tstadm       4584        1   0.0       9572       3240 sapstart pf=/usr/sap/TST/SYS/profile/TST_HDB00_suse11
tstadm       4591     4584   0.0     432728      72492  \_ /usr/sap/TST/HDB00/suse11/trace/hdb.sapTST_HDB00 -d -nw -f /usr/sap/TST/HDB00/suse11/d
tstadm       4609     4591   0.7    9770660    1652796      \_ hdbnameserver
tstadm       4928     4591   0.2     424136     126520      \_ hdbcompileserver
tstadm       4931     4591   0.2     692988     155156      \_ hdbpreprocessor
tstadm       5053     4591   0.6    9781348    1813164      \_ hdbindexserver -port 30003
tstadm       5064     4591   0.4    5054852    1079784      \_ hdbxsengine -port 30007
tstadm       5695     4591   0.2    2386460     419944      \_ hdbwebdispatcher
tstadm       2181        1   0.0     482660      31228 /usr/sap/TST/HDB00/exe/sapstartsrv pf=/usr/sap/TST/SYS/profile/TST_HDB00_suse11 -D -u tsta
tstadm       2097        1   0.0      46864      11096 /usr/lib/systemd/systemd --user
tstadm       2098     2097   0.0      78448       3992  \_ (sd-pam)
tstadm@suse11:/usr/sap/TST/HDB00> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/TST/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

29.01.2024 15:29:33
Stop
OK
Waiting for stopped instance using: /usr/sap/TST/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2


29.01.2024 15:34:43
WaitforStopped
OK
hdbdaemon is stopped.
tstadm@suse11:/usr/sap/TST/HDB00>

DISCUSSIONS: This is only required when one needs to reboot the node. The reboot process stops all the processes including HANA although it is advised to manually stop HANA so that any problem related to HANA can be observed when it is manually stopped.

5. Disable the pacemaker on the node where the reboot is required after the maintenance


suse11:~ # systemctl disable pacemaker.service 
Removed /etc/systemd/system/multi-user.target.wants/pacemaker.service.
suse11:~ #

DISCUSSIONS: Disabling of pacemaker service is required to avoid unintended start of the cluster after the reboot of the nodes.

6. Perform the maintenance and reboot if required

7. Enable the pacemaker after the reboot


suse11:~ # systemctl enable pacemaker.service 
Created symlink /etc/systemd/system/multi-user.target.wants/pacemaker.service → /usr/lib/systemd/system/pacemaker.service.
suse11:~ #

8. Start the HANA manually


suse11:~ # su - tstadm 
tstadm@suse11:/usr/sap/TST/HDB00> HDB info
USER          PID     PPID  %CPU        VSZ        RSS COMMAND
tstadm       5017     5016   0.2      14404       7304 -sh
tstadm       5319     5017   0.0       8284       3968  \_ /bin/sh /usr/sap/TST/HDB00/HDB info
tstadm       5350     5319   0.0      17848       3908      \_ ps fx -U tstadm -o user:8,pid:8,ppid:8,pcpu:5,vsz:10,rss:10,args
tstadm       2191        1   0.2     416392      30192 /usr/sap/TST/HDB00/exe/sapstartsrv pf=/usr/sap/TST/SYS/profile/TST_HDB00_suse11 -D -u tsta
tstadm       2100        1   0.0      46808      10972 /usr/lib/systemd/systemd --user
tstadm       2101     2100   0.0      78472       3996  \_ (sd-pam)
tstadm@suse11:/usr/sap/TST/HDB00> HDB start


StartService
Impromptu CCC initialization by 'rscpCInit'.
  See SAP note 1266393.
OK
OK
Starting instance using: /usr/sap/TST/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function StartWait 2700 2


29.01.2024 15:39:24
Start
OK

29.01.2024 15:40:19
StartWait
OK
tstadm@suse11:/usr/sap/TST/HDB00>

DISCUSSIONS: It is always a best practice to start the HANA manually after a reboot and before the cluster start so that later on cluster finds HANA in as close a state as possible when the maintenance was set in.

9. Start the cluster on the secondary site nodes


suse22:~ # crm cluster start suse11 suse12
INFO: The cluster stack started on suse11
INFO: The cluster stack started on suse12
suse22:~ #


Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:41:22 2024
  * Last change:  Mon Jan 29 15:40:11 2024 by root via crm_attribute on suse21
  * 5 nodes configured
  * 13 resource instances configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm (unmanaged)
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00] (unmanaged):
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse12 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse21 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse22 (unmanaged)
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Master suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

10. Refresh the cln_ and msl_ resources

    
suse22:~ # cs_clusterstate -i
### suse22 - 2024-01-29 15:42:35 ###
Cluster state: S_IDLE
suse22:~ # crm resource refresh cln_SAPHanaTop_TST_HDB00 
Cleaned up rsc_SAPHanaTop_TST_HDB00:0 on suse12
Cleaned up rsc_SAPHanaTop_TST_HDB00:0 on suse11
Cleaned up rsc_SAPHanaTop_TST_HDB00:1 on suse21
Cleaned up rsc_SAPHanaTop_TST_HDB00:2 on suse22
Cleaned up rsc_SAPHanaTop_TST_HDB00:3 on susemm
Cleaned up rsc_SAPHanaTop_TST_HDB00:4 on suse22
Cleaned up rsc_SAPHanaTop_TST_HDB00:4 on suse12
Cleaned up rsc_SAPHanaTop_TST_HDB00:4 on susemm
Cleaned up rsc_SAPHanaTop_TST_HDB00:4 on suse21
... got reply
... got reply
... got reply
... got reply
... got reply
Cleaned up rsc_SAPHanaTop_TST_HDB00:4 on suse11
Waiting for 5 replies from the controller
... got reply
... got reply
... got reply
... got reply
... got reply (done)
suse22:~ # cs_wait_for_idle -s 5
Cluster state: S_IDLE
suse22:~ # crm resource refresh msl_SAPHanaCon_TST_HDB00 
Cleaned up rsc_SAPHanaCon_TST_HDB00:0 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:0 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:0 on suse11
Cleaned up rsc_SAPHanaCon_TST_HDB00:1 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:2 on susemm
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on susemm
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:3 on suse11
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse22
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse12
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on susemm
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse21
Cleaned up rsc_SAPHanaCon_TST_HDB00:4 on suse11
Waiting for 15 replies from the controller
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply
... got reply (done)
suse22:~ # cs_wait_for_idle -s 5
Cluster state: S_IDLE
suse22:~ #



Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:44:11 2024
  * Last change:  Mon Jan 29 15:44:03 2024 by hacluster via crmd on susemm
  * 5 nodes configured
  * 13 resource instances configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm (unmanaged)
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00] (unmanaged):
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse12 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse21 (unmanaged)
    * rsc_SAPHanaTop_TST_HDB00  (ocf::suse:SAPHanaTopology):     Started suse22 (unmanaged)
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

DISCUSSIONS: Refreshing the resource probes the state of resources and corrects the values of the attributes as per the new state of the resources.

11. Remove the global maintenance from cluster

    
suse22:~ # crm maintenance off
suse22:~ #

Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:45:14 2024
  * Last change:  Mon Jan 29 15:45:07 2024 by root via cibadmin on suse22
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00 (unmanaged):
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21 (unmanaged)
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21 (unmanaged)
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable, unmanaged):
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse21 (unmanaged)
    * rsc_SAPHanaCon_TST_HDB00  (ocf::suse:SAPHanaController):   Slave suse22 (unmanaged)

12. Remove maintenance from the multi-state resource and ip group resource

    
suse22:~ # crm resource maintenance g_ip_TST_HDB00 off
suse22:~ # crm resource maintenance msl_SAPHanaCon_TST_HDB00 off
suse22:~ #

Cluster Summary:
  * Stack: corosync
  * Current DC: susemm (version 2.1.5+20221208.a3f44794f-150500.6.5.8-2.1.5+20221208.a3f44794f) - partition with quorum
  * Last updated: Mon Jan 29 15:46:54 2024
  * Last change:  Mon Jan 29 15:46:52 2024 by root via crm_attribute on suse21
  * 5 nodes configured
  * 13 resource instances configured

Node List:
  * Online: [ suse11 suse12 suse21 suse22 susemm ]

Active Resources:
  * stonith-sbd (stonith:external/sbd):  Started susemm
  * Resource Group: g_ip_TST_HDB00:
    * rsc_ip_TST_HDB00  (ocf::heartbeat:IPaddr2):        Started suse21
    * rsc_nc_TST_HDB00  (ocf::heartbeat:azure-lb):       Started suse21
  * Clone Set: cln_SAPHanaTop_TST_HDB00 [rsc_SAPHanaTop_TST_HDB00]:
    * Started: [ suse11 suse12 suse21 suse22 ]
  * Clone Set: msl_SAPHanaCon_TST_HDB00 [rsc_SAPHanaCon_TST_HDB00] (promotable):
    * Masters: [ suse21 ]
    * Slaves: [ suse11 suse12 suse22 ]

13. Check the Cluster Status

    
suse22:~ # cs_clusterstate -i
### suse22 - 2024-01-29 15:47:11 ###
Cluster state: S_IDLE
suse22:~ #

TODO: Also include the following steps from section “* Check status of Linux cluster and HANA system replication pair.” from manpage SAPHanaSR_maintenance_examples(7)


           # cs_clusterstate
           # crm_mon -1r
           # crm configure show | grep cli-
           # SAPHanaSR-showAttr
           # cs_clusterstate -i

14. Perform the takeover as described here and after that rerun steps 1 to 13 on the new secondary site nodes

15. Perform the maintenance on the majority maker node

Please also read our other blogs about #TowardsZeroDowntime.

Where can I find further information?

SUSECON 2020 BP-1351 Tipps, Tricks and Troubleshooting
Manual pages
- SAPHanaSR-ScaleOut(7)
- ocf_suse_SAPHanaController(7)
- ocf_suse_SAPHanaTopology(7)
- SAPHanaSR.py(7)
- SAPHanaSrMultiTarget.py(7)
- SAPHanaSR-ScaleOut_basic_cluster(7)
- SAPHanaSR-showAttr(8)
- SAPHanaSR_maintenance_examples(7)
- sbd(8)
- cs_man2pdf(8)
- cs_show_hana_info(8)
- cs_wait_for_idle(8)
- cs_clusterstate(8)
- cs_show_sbd_devices(8)
- cs_make_sbd_devices(8)
- supportconfig_plugins(5)
- crm(8)
- crmadmin(8)
- crm_mon(8)
- ha_related_suse_tids(7)
- ha_related_sap_notes(7)
SUSE support TIDs
- Troubleshooting the SAPHanaSR python hook (000019865)
- Indepth HANA Cluster Debug Data Collection (PACEMAKER, SAP) (7022702)
- HANA SystemReplication doesn’t provide SiteName … (000019754)
- SAPHanaController running in timeout when starting SAP Hana (000019899)
- SAP HANA monitors timed out after 5 seconds (000020626)
Related blog articles: https://www.suse.com/c/tag/towardszerodowntime/
Blog Part 1 on SAP HANA Maintenance Procedure: https://www.suse.com/c/sles-for-sap-hana-maintenance-procedures-part-1-pre-maintenance-checks/
Product documentation: https://documentation.suse.com/
Pacemaker Upstream documentation on cluster property options: https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-cluster-options.html

(Visited 24 times, 1 visits today)

Jul 24th, 2023

Sanjeet Kumar Jha I am a SAP Solution Architect for High Availability at SUSE. I have over a decade years of experience with SUSE high availability technologies for SAP applications.

SLES for SAP maintenance procedure for Scale-Out Perf-Opt HANA cluster

HANA takeover procedures

HANA maintenance (or linux OS maintenance when a reboot of the node is not required)

Linux maintenance with reboot

Where can I find further information?

Related Articles

CentOS Alternatives: openSUSE, choosing your own destiny during the install

New SAP Solutions on Azure by SLES for SAP Applications

It’s THE time: SUSE doc survey 2023 ‘call to action’

SUSE Linux Enterprise Server 15 SP6 Public Release Candidate ready to rock!