Basic health check for two-node SAP HANA performance based model
This document (7022984) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 11 Service Pack 3
SUSE Linux Enterprise Server for SAP Applications 11 Service Pack 4
SUSE Linux Enterprise Server for SAP Applications 12 Service Pack 1
SUSE Linux Enterprise Server for SAP Applications 12 Service Pack 2
Situation
Resolution
For the purposes of this document, 'master' can be equated to 'primary' (mode: PRIMARY) and 'slave' can be equated to 'secondary' (mode: SYNC).
1. Put the cluster in to maintenance mode (see TID#7023135) or stop pacemaker on each node. If the cluster is put in to maintenance mode, no cluster or resource actions will be initiated until the cluster is taken out of maintenance mode.
If pacemaker is manually stopped on each node, then the cluster will attempt to shutdown the SAP database and related processes on that node where pacemaker is being stopped. To avoid the possibility of triggering a 'take-over,' care should be taken to stop pacemaker on the 'slave' node first and allow enough time for the unload process to complete.
2. Check the current status of the SAP node synchronization:
Login to the server that is designated as the SAP database 'primary' node (the node that is designated to host the non-slave database) using the SAP administrator account (e.g. a00adm, where 'a00' is the SAP System ID). The SAP administrator account is created when the SAP product is installed.
NOTE: If the SAP administrator account password is unknown/lost, that password can be safely changed without causing issue. This account password is per-server and not synchronized across nodes, so changing the password to the same known password on both nodes is prudent.
In the following examples, the 'SAP HANA System ID' is 'A00' and the 'SAP Instance Number' is '00'.
NOTE: depending on which access method is used (direct console login, ssh etc.), the shell prompt may show as 'user@hostname/<path>' or may display as something like 'sh-4.2$'.
Execute 'HDB info' and this will show you what SAP related processes are running on that node.
An example showing that only threads related to running the 'HDB info' command and the standard SAP instance service deamon are active:-
a00adm@sapn1:/usr/sap/A00/HDB00> HDB info
USER PID PPID %CPU VSZ RSS COMMAND
a00adm 5183 5178 0.0 87684 1804 sshd: a00adm@pts/0
a00adm 5184 5183 0.1 14808 3620 \_ -sh
a00adm 5269 5184 0.0 13200 1824 \_ /bin/sh /usr/sap/A00/HDB00/HDB info
a00adm 5294 5269 0.0 26668 1356 \_ ps fx -U a00adm -o user,pid,ppid,pcpu,vsz,rss,args
a00adm 2104 1 0.0 362484 27184 /usr/sap/A00/HDB00/exe/sapstartsrv pf=/usr/sap/A00/SYS/profile/A00_HDB00_sapn1 -D -u a00adm
a00adm 2004 1 0.0 31844 2352 /usr/lib/systemd/systemd --user
a00adm 2008 2004 0.0 63796 2620 \_ (sd-pam)
a00adm@sapn1:/usr/sap/A00/HDB00>
An example showing that the node is currently running a SAP database and related SAP processes:
a00adm@sapn1:/usr/sap/A00/HDB00> HDB info
USER PID PPID %CPU VSZ RSS COMMAND
a00adm 5183 5178 0.0 87684 1804 sshd: a00adm@pts/0
a00adm 5184 5183 0.0 14808 3624 \_ -sh
a00adm 5994 5184 0.0 13200 1824 \_ /bin/sh /usr/sap/A00/HDB00/HDB info
a00adm 6019 5994 0.0 26668 1356 \_ ps fx -U a00adm -o user,pid,ppid,pcpu,vsz,rss,args
a00adm 5369 1 0.0 20932 1644 sapstart pf=/usr/sap/A00/SYS/profile/A00_HDB00_sapn1
a00adm 5377 5369 1.8 582944 292720 \_ /usr/sap/A00/HDB00/sapn1/trace/hdb.sapA00_HDB00 -d -nw -f /usr/sap/A00/HDB00/sapn1/daemon.ini pf=/usr/sap/A00/SYS/profile/A00_HDB00_sapn1
a00adm 5394 5377 9.3 3930388 1146444 \_ hdbnameserver
a00adm 5548 5377 21.3 2943472 529672 \_ hdbcompileserver
a00adm 5550 5377 4.4 2838792 465664 \_ hdbpreprocessor
a00adm 5571 5377 91.6 7151116 4019640 \_ hdbindexserver
a00adm 5573 5377 21.8 4323488 1203128 \_ hdbxsengine
a00adm 5905 5377 18.9 3182120 710680 \_ hdbwebdispatcher
a00adm 2104 1 0.0 428748 27760 /usr/sap/A00/HDB00/exe/sapstartsrv pf=/usr/sap/A00/SYS/profile/A00_HDB00_sapn1 -D -u a00adm
a00adm 2004 1 0.0 31844 2352 /usr/lib/systemd/systemd --user
a00adm 2008 2004 0.0 63796 2620 \_ (sd-pam)
a00adm@sapn1:/usr/sap/A00/HDB00>
In order to check if the nodes in the cluster can synchronize properly, you need to be at a point where both nodes are correctly running the expected SAP database and processes (remember, even though one node is designated as a slave and one node is a master, both nodes actually run a database, where the master database is continuously synchronized to the slave database).
If SAP database and services are not active on the node, you may need to run the appropriate command to start up the processes:
e.g. 'HDB start'
or 'sapcontrol -nr 00 -function Start' where '00' is the number of the SAP instance number.
NOTE: To stop the SAP database and processes on a node, you can use 'HDB stop' or 'sapcontrol -nr 00 -function Stop' , where '00' is the number of the SAP instance.
Once both nodes are showing that SAP is active (check using 'HDB info') the synchronization state of the databases can be checked.
If the SAP installation is functioning correctly, you should see something similar to the following by executing the python script 'systemReplicationStatus.py' on each node:
On Master/Primary node:
sh-4.2$ pwd
/hana/shared/A00/HDB00/exe/python_support
sh-4.2$ python systemReplicationStatus.py
| Host | Port | Service Name | Volume ID | Site ID | Site Name | Secondary | Secondary | Secondary | Secondary | Secondary | Replication | Replication | Replication |
| | | | | | | Host | Port | Site ID | Site Name | Active Status | Mode | Status | Status Details |
| ----- | ----- | ------------ | --------- | ------- | --------- | --------- | --------- | --------- | --------- | ------------- | ----------- | ----------- | -------------- |
| sapn1 | 30007 | xsengine | 2 | 1 | node1 | sapn2 | 30007 | 2 | node2 | YES | SYNC | ACTIVE | |
| sapn1 | 30001 | nameserver | 1 | 1 | node1 | sapn2 | 30001 | 2 | node2 | YES | SYNC | ACTIVE | |
| sapn1 | 30003 | indexserver | 3 | 1 | node1 | sapn2 | 30003 | 2 | node2 | YES | SYNC | ACTIVE | |
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: node1
sh-4.2$
-------------------------------------------------------------------------------------------
On the Slave node:
sh-4.2$ pwd
/hana/shared/A00/HDB00/exe/python_support
sh-4.2$ python systemReplicationStatus.py
this system is either not running or not primary system replication site
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: SYNC
site id: 2
site name: node2
active primary site: 1
primary masters: sapn1
sh-4.2$
If the 'replication status' of any of the SAP processes is not showing as 'ACTIVE' then the databases may need more time to 'catch up' to a point where they are fully in-sync. This depends how long it is since the SAP processes were started on each node, how long the master database may have been running whilst the slave database may have been down/unavailable for syncing and how much data has been written to the master database since the last complete sync. The time required to sync can vary greatly depending on these factors and may take from a few minutes to some hours.
NOTE: The python script 'systemReplicationStatus.py' is located in the '/hana/shared/<system_id>/HDB<instance_number>/exe/python_support' directory.
If it becomes clear that the SAP database synchronization is not working, it may be necessary to reconfigure/re-enable replication between the master and slave nodes*, or it may be necessary to contact the SAP support organization for assistance.
* See TID#7023127 - 'How to re-enable replication in a two-node SAP performance based model'.
Don't forget to take the cluster out of maintenance mode when appropriate (the cluster will remain in maintenance mode even after nodes are rebooted unless the cluster is manually taken out of maintenance mode). If the nodes have not been rebooted, then care should be taken to return all of the cluster resources to the same state as they were when the cluster was put in to maintenance mode before you actually bring the cluster out of maintenance mode, otherwise the cluster may not reflect the true state of each resource and on a failure, the cluster may not behave as expected.
Cause
Additional Information
If the SAP nodes are in-sync but problems have developed with the SUSE operating system or SUSE High Availability clustering extension, then opening a support request with SUSE is likely the right course of action.
Please note that SUSE is not responsible for the configuration of a cluster. SUSE consulting services can be employed to configure or re-configure the product.
If a business critical situation exists, where the SAP nodes are 'in sync' but a problem exists with the SUSE High Availability extension, then 'down time' can be avoided by running the SAP nodes with the cluster put in to maintenance mode, or with pacemaker stopped on each node, until such time that to have downtime for remediation is acceptable.
Useful SAP Notes:
2434562 - System Replication Hanging in Status "SYNCING" or "ERROR" With Status Detail "Missing Log" or "Invalid backup size"
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7022984
- Creation Date: 18-May-2018
- Modified Date:12-Oct-2022
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com