Safely Disabling/Enabling System Replication on a Scale-Up SAP HANA cluster
This document (000021374) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP3
SUSE Linux Enterprise Server for SAP Applications 15 SP2
SUSE Linux Enterprise Server for SAP Applications 12 SP5
Situation
In some cases, in a scale-up HANA cluster, it might be required to temporarily disable the HANA system replication, e.g., the situation described on SAPNOTE# 2821539. It is possible to do so while keeping the Pacemaker cluster running; however, the correct maintenance procedure needs to be followed, as outlined below in the Resolution section.
During the procedure, the SAPHanaTopology resource is not put into maintenance so we can see the node attribute changes. It is important to note that when the secondary HANA is unregistered but also after the replication is disabled on the primary HANA, the hana_<sid>_roles attribute will change, respectively from primary(P) or secondary(S) to none(N):
Before:
hana_ha1_roles : 4:P:master1:master:worker:master
hana_ha1_roles : 4:S:master1:master:worker:master
After:
hana_ha1_roles : 4:N:master1:master:worker:master
hana_ha1_roles : 4:N:master1:master:worker:master
This behavior is expected; none (N) means that there is no active system replication, and HANA is running standalone on that particular node.
If the maintenance procedure is not followed, the cluster fails to manage the N role effectively. As a result, when the none (N) role is detected, the SAPHana resource agent automatically returns rc=1 OCF_ERR_GENERIC (HANA_STATE_DEFECT), which will eventually lead to a restart of the SAPHana resource.
Example Log Entries:
hana-t1 SAPHana(rsc_SAPHana_HA1_HDB00)[14492]: INFO: ACT site=Primary, setting SFAIL for secondary (5) - srRc=11 lss=4
hana-t1 pacemaker-attrd[3748]: notice: Setting hana_ha1_roles[hana-t1]: 4:P:hana-t1:master:worker:master -> 4:N:hana-t1:master:worker:master
hana-t1 SAPHana(rsc_SAPHana_HA1_HDB00)[89057]: WARNING: ACT: saphana_monitor_clone: HANA_STATE_DEFECT
hana-t1 pacemaker-controld[3750]: notice: Result of monitor operation for rsc_SAPHana_HA1_HDB00 on hana-t1: error
hana-t1 pacemaker-attrd[3748]: notice: Setting fail-count rsc_SAPHana_HA1_HDB00#monitor_60000[hana-t1]: (unset) -> 1
hana-t1 pacemaker-controld[3750]: notice: Requesting local execution of stop operation for rsc_SAPHana_HA1_HDB00 on hana-t1
Resolution
Safely Disabling/Enabling System Replication Procedure Using SAP Tools
Before any maintenance, check the cluster for errors. Record the SAP HANA site names, SID, and instance number. Use SAPHanaSR-showAttr to get the site names known by the cluster. Do not change these site names; they must match those recognized by the cluster.
Steps:
1. Set the SAPHana Multi-State Resource into maintenance mode.
# crm resource maintenance msl_SAPHana_HA1_HDB00 on
Verify that the multi-state resource shows as (unmanaged):
hana-t1:~ # crm_mon -1Ar
Node List:
* Online: [ hana-t1 hana-t2 ]
Full List of Resources:
* stonith-sbd (stonith:external/sbd): Started hana-t1
* rsc_ip_HA1_HDB00 (ocf::heartbeat:IPaddr2): Started hana-t1
* Clone Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00] (promotable, unmanaged):
* rsc_SAPHana_HA1_HDB00 (ocf::suse:SAPHana): Master hana-t1 (unmanaged)
* rsc_SAPHana_HA1_HDB00 (ocf::suse:SAPHana): Slave hana-t2 (unmanaged)
* Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]:
* Started: [ hana-t1 hana-t2 ]
2. Ensure the cluster is in idle state (S_IDLE) before proceeding (cs_wait_for_idle is part of the ClusterTools2 package).
hana-t1:~ # cs_wait_for_idle -s 5
Cluster state: S_IDLE
3. Login to the primary node and check the site overall replication status is in sync.
hana-t1:~ # su - ha1adm
ha1adm@hana-t1:/usr/sap/HA1/HDB00> HDBSettings.sh systemReplicationStatus.py --sapcontrol=1 | egrep -i '(site|overall).*replication_status'
site/2/REPLICATION_STATUS=ACTIVE
overall_replication_status=ACTIVE
4. Login to the secondary node, and stop the secondary HANA System.
hana-t2:~ # su - ha1adm
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function StopSystem HDB
24.10.2024 13:56:18
StopSystem
OK
Verify that the system has stopped:
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function GetProcessList
24.10.2024 13:57:45
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GRAY, Stopped, , , 20350
Ensure that all processes are stopped (GRAY status).
5. Unregister the secondary HANA System
As <sid>adm on secondary, execute:
ha1adm@hana-t2:/usr/sap/HA1/HDB00> hdbnsutil -sr_unregister
unregistering site ...
Opening persistence ...
ha1adm: no process found
run as transaction master
updating topology for system replication takeover ...
mapped host hana-t1 to hana-t2
sending unregister request to primary site (1) ...
clearing local ini files ...
removing HSR entries from topology ...
#####################################################################################
### CAUTION: You must start the database in order to complete the unregistration! ###
#####################################################################################
done.
Performing Final Memory Release with 49 threads.
Finished Final Memory Release successfuly.
6. Start the secondary HANA System.
In order to complete the unregistration, the database must be started.
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function StartSystem HDB
24.10.2024 14:03:25
StartSystem
OK
Verify and ensure that the HANA processes are running (GREEN status).
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function GetProcessList
24.10.2024 14:06:43
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GREEN, Running, 2024 10 24 14:03:28, 0:03:15, 31221
hdbcompileserver, HDB Compileserver, GREEN, Running, 2024 10 24 14:03:59, 0:02:44, 31842
hdbindexserver, HDB Indexserver-HA1, GREEN, Running, 2024 10 24 14:04:02, 0:02:41, 31884
hdbnameserver, HDB Nameserver, GREEN, Running, 2024 10 24 14:03:29, 0:03:14, 31247
hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2024 10 24 14:03:59, 0:02:44, 31845
hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2024 10 24 14:05:49, 0:00:54, 1227
hdbxsengine, HDB XSEngine-HA1, GREEN, Running, 2024 10 24 14:04:02, 0:02:41, 31887
7. Disable System Replication on the primary HANA system.
On the primary node, switch to <sid>adm and disable SR.
hana-t1:~ # su - ha1adm
ha1adm@hana-t1:/usr/sap/HA1/HDB00> hdbnsutil -sr_disable
checking local nameserver:
nameserver is running, proceeding ...
done.
8. Verify the hana_<sid>_roles attributes.
At this point, both nodes should have the hana_<sid>_roles attribute as N (none). Check the cluster status:
# crm_mon -1Ar
Node Attributes:
* Node: hana-t1:
* hana_ha1_clone_state : PROMOTED
* hana_ha1_roles : 4:N:master1:master:worker:master
* ...
* Node: hana-t2:
* hana_ha1_clone_state : DEMOTED
* hana_ha1_roles : 4:N:master1:master:worker:master
* ...
9. Perform Maintenance Tasks
Proceed with any required maintenance tasks, such as those outlined in SAP Note 2821539.
10. Enable System Replication on the Primary HANA System.
On primary, as <sid>adm, enable system replication:
hana-t1:~ # su - ha1adm
ha1adm@hana-t1:/usr/sap/HA1/HDB00> hdbnsutil -sr_enable --name=SITE1
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.
Check the SR configuration on the primary:
ha1adm@hana-t1:/usr/sap/HA1/HDB00> hdbnsutil -sr_stateConfiguration --sapcontrol=1
SAPCONTROL-OK: <begin>
mode=primary
site id=1
site name=SITE1
SAPCONTROL-OK: <end>
done.
The mode should have changed from “none” to “primary”.
11. Stop the Secondary HANA System.
Before registering the SAP HANA database instance on the secondary side for system replication, it needs to be stopped first. On secondary, as <sid>adm, stop the HANA system:
hana-t2:~ # su - ha1adm
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function StopSystem HDB
24.10.2024 15:22:26
StopSystem
OK
Verify that the system is stopped:
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function GetProcessList
24.10.2024 15:23:25
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GRAY, Stopped, , , 31221
12. Register the Secondary HANA System.
ha1adm@hana-t2:/usr/sap/HA1/HDB00> hdbnsutil -sr_register --name=SITE2 --remoteHost=hana-t1 --remoteInstance=00 --replicationMode=sync --operationMode=logreplay
adding site ...
collecting information ...
updating local ini files ...
done.
13. Start the Secondary HANA System.
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function StartSystem HDB
24.10.2024 15:27:16
StartSystem
OK
Verify that the system is running:
ha1adm@hana-t2:/usr/sap/HA1/HDB00> sapcontrol -nr 00 -function GetProcessList
24.10.2024 15:30:18
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GREEN, Running, 2024 10 24 15:27:20, 0:02:58, 11158
hdbcompileserver, HDB Compileserver, GREEN, Running, 2024 10 24 15:27:41, 0:02:37, 11466
hdbindexserver, HDB Indexserver-HA1, GREEN, Running, 2024 10 24 15:27:43, 0:02:35, 11500
hdbnameserver, HDB Nameserver, GREEN, Running, 2024 10 24 15:27:21, 0:02:57, 11183
hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2024 10 24 15:27:41, 0:02:37, 11469
hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2024 10 24 15:29:10, 0:01:08, 13106
hdbxsengine, HDB XSEngine-HA1, GREEN, Running, 2024 10 24 15:27:43, 0:02:35, 11503
Check SR configuration:
ha1adm@hana-t2:/usr/sap/HA1/HDB00> hdbnsutil -sr_stateConfiguration --sapcontrol=1
SAPCONTROL-OK: <begin>
mode=sync
site id=2
site name=SITE2
active primary site=1
primary masters=hana-t1
SAPCONTROL-OK: <end>
done.
14. Refresh the Multi-State SAPHana Resource.
# crm resource refresh msl_SAPHana_HA1_HDB00
Check the hana_<sid>_roles attribute to confirm roles have been correctly assigned:
# crm_mon -1Ar
Node Attributes:
* Node: hana-t1:
* hana_ha1_clone_state : PROMOTED
* hana_ha1_roles : 4:P:master1:master:worker:master
* ...
* Node: hana-t2:
* hana_ha1_clone_state : DEMOTED
* hana_ha1_roles : 4:S:master1:master:worker:master
* ...
15. Bring the SAPHana Resource Back to Managed State.
# crm resource maintenance msl_SAPHana_HA1_HDB00 off
16. Remove Maintenance Meta Attribute from CIB.
# crm resource meta msl_SAPHana_HA1_HDB00 delete maintenance
17. Verify Cluster and HANA Status.
Check the cluster status and ensure that all resources are running and managed:
# cs_clusterstate -a
# crm_mon -1Ar
# crm configure show | grep cli-
# SAPHanaSR-showAttr
# cs_clusterstate -i
Status
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021374
- Creation Date: 27-Feb-2024
- Modified Date:28-Oct-2024
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com