SAPHanaController monitor timeout leads to database restart
This document (000021249) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise for SAP Applications 15
Situation
In certain situations on a SAP HANA scale-out system replication cluster the
resource agent monitor call to "landscapeHostConfiguration.py" times out two
times in a row on one of the nodes. This results in a local restart of the
SAP HANA database at that site.
Resource agent monitor timeouts are logged on the cluster node similar to:
... SAPHanaController ... RA ==== begin action monitor_clone (0.180.0.0628.1823) ====
... SAPHanaController ... RA: HANA_CALL TIMEOUT after 120 seconds running command 'landscapeHostConfiguration.py --sapcontrol=1'
... SAPHanaController ... RA: landscapeHostConfiguration.py second TIMEOUT after 120 seconds
... SAPHanaController ... RA ==== end action monitor_clone with rc=1 (0.180.0.0628.1823) (265s)====
Pacemaker messages are logged on that node similar to:
... Processing failed monitor of rsc_SAPHanaCon_P42_HDB02:0 on hana85: unknown error
... Initiating demote operation rsc_SAPHanaCon_P42_HDB02_demote_0 on hana85
... Initiating stop operation rsc_SAPHanaCon_P42_HDB02_stop_0 on hana85
... Initiating start operation rsc_SAPHanaCon_P42_HDB02_start_0 on hana85
Pacemaker actions are logged on the designated coordinator similar to:
... Setting hana_p42_clone_state[hana85]: PROMOTED -> DEMOTED from hana85
... Setting hana_p42_clone_state[hana85]: DEMOTED -> UNDEFINED from hana85
... Setting hana_p42_clone_state[hana85]: UNDEFINED -> DEMOTED from hana85
... Setting hana_p42_clone_state[hana85]: DEMOTED -> PROMOTED from hana85
Resolution
Cause
Those tools are heavily depending on infrastructure, like NFS and Directory
Services. Newer versions of the RAs SAPHanaController and SAPHanaTopology are
calling landscapeHostConfiguration.py directly. Thus, temporary infrastructure
problems have less impact on HA cluster monitor calls.
Note: If not only the tools are affected, but the HANA database itself,
the stop operation will fail. In that case the node would get fenced and
finally a takeover would be triggered.
Status
Additional Information
See also:
Manual page ocf_suse_SAPHanaController(7), ocf_suse_SAPHanaTopology(7)
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021249
- Creation Date: 26-Oct-2023
- Modified Date:10-Jan-2024
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com