HANA cluster failures due to full /tmp filesystem
This document (000021331) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 15 SP4
SUSE Linux Enterprise Server for SAP Applications 15 SP3
SUSE Linux Enterprise Server for SAP Applications 15 SP2
SUSE Linux Enterprise Server for SAP Applications 15 SP1
SUSE Linux Enterprise Server for SAP Applications 12 SP5
Situation
The primary node in the HANA cluster experienced a resource failure, resulting in an instance stop and subsequent failover to the secondary node. Analysis of the logs from the HANA cluster resource agents showed that all HANA_CALL operations failed, leading to the HANA instance being marked with a status of HANA_STATE_DEFECT, as indicated by the logs:
SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: INFO: ACT: hdbnsutil not answering - using global.ini as fallback - srmode= SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: ERROR: ACT: check_for_primary: we didn't expect srmode to be: DUMP: <00000000 0a |.|#01200000001> SAPHanaTopology(rsc_SAPHanaTopology_HDB00)[280472]: WARNING: ACT: sht_monitor_clone: HANA_STATE_DEFECT (primary/secondary state could not be detected at this point of time)
The cluster responded effectively by stopping the HANA instance on the primary node and successfully failing over to the secondary node, which restored the functionality of the HANA cluster. However, after this failover process, the original primary instance node was unable to start. Further investigation into this issue revealed that the /tmp filesystem on the primary node was completely full, with 100% usage recorded at the time of the incident.
Resolution
The '/tmp' filesystem plays a critical role in the operation of both the HANA instance and the overall cluster stack. In scenarios where issues arise due to this filesystem being full, resolving them typically involves either cleaning up or expanding the /tmp filesystem. Once free space is available again in /tmp, the HANA cluster can be restarted and is expected to resume normal operations.
To address the issue of '/tmp' filesystem usage, a maintenance update has been released for HANA cluster resource agents for both Scale-Up and Scale-Out configurations. This fix avoids the use of the '/tmp' filesystem, ensuring that the SAP HANA resource agents remain operational even when '/tmp' is full. It is recommended to update the SAPHanaSR or SAPHanaSR-ScaleOut package to the latest version, or at least to the versions specified below:
SUSE version | SAPHanaSR version | SAPHanaSR-ScaleOut version |
---|---|---|
SLES12 SP5 for SAP | SAPHanaSR-0.162.2-3.32.2 | SAPHanaSR-ScaleOut-0.185.0-3.32.1 |
SLES15 (SP1, SP2, SP3, SP4, SP5) for SAP | SAPHanaSR-0.162.2-150000.4.34.1 | SAPHanaSR-ScaleOut-0.185.1-150000.39.1 |
Cause
Status
Additional Information
https://www.suse.com/support/kb/doc/?id=000021361
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021331
- Creation Date: 22-Jan-2024
- Modified Date:19-Feb-2024
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com