HANA fail-over and secondary recovery operations fail due to excessive os.system() execution times
This document (000020835) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 12
Situation
Resolution
The script optimization first appeared in:
SUSE Linux Enterprise Server for SAP Applications 15.x
SAPHanaSR-ScaleOut-0.181.0-30.1.noarch.rpm March 2022
SUSE Linux Enterprise Server for SAP Applications 12.x
SAPHanaSR-ScaleOut-0.181.0-3.26.1.noarch.rpm March 2022
Cause
Using kernel trace data from the system clone enter and exit operations, it was possible to match the 'long clones' to the exact number of os.system() calls that the hook script made. Also, the elapsed time between the long clones in the system trace data exactly matched the elapsed time between the hook script's logged uses of os.system().
From this it was deduced that any long duration of the hook script was caused only in HANA python when using os.system() and only due to the nature of the HANA process, it's size, number of executing elements and business. This was reported to SAP with supporting data.
It's important to realize there is no bug in a SUSE element here. The long fork() duration is not normal and is a product of the way HANA python os.system() calls are being made.
Additional Information
It was also found that use of the HANA feature to perform an "RTE Dump", can introduce long recovery wait times and timeouts.
RTE dump operates with a global lock on the HANA database meaning the server can't be stopped safely (e.g. by something like a cluster failover). A secondary server can't rejoin the primary until this process has completed. SAP advises that the use of the RTE Dump feature on a production system is not recommended.
It is suspected that HANA external authentication (e.g. via ldap) can also add to delays of the os.system() call. Use of a local SAP HANA administrator user should greatly reduce these delays.
Note that any other scripts using the same HANA os.system() calls are also liable to suffer the same performance issues until the issue is fixed by SAP.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020835
- Creation Date: 31-Oct-2022
- Modified Date:02-Nov-2022
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com