SUSE HA for HANA cluster node fenced at shutdown, despite of systemd integration
This document (000021046) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise for SAP Applications 15
Situation
If the whole system is shutdown including HANA and SUSE HA, the node gets fenced. This happens because systemd SAP<sid>_<nr> service´s sapstartsrv is stopped before HANA and systemd prevents the RA from re-starting the sapstartsrv.
This leads to an RA stop failure and node fence.
It looks like this in the system log:
# last reboot -n 1 reboot system boot 5.14.21-150400.2 Tue Apr 18 12:34 still running # grep "2023-04-18T12:3.*rsc_SAPHana.*stop.fail" /var/log/messages 2023-04-18T12:31:26.386765+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: SAP Instance S07-HDB00 stop failed # grep "2023-04-18T12:3.*rsc_SAPHana.*\[14955\]" /var/log/messages 2023-04-18T12:31:24.725478+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: INFO: RA ==== begin action stop_clone (0.162.1) ==== 2023-04-18T12:31:26.359308+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: WARNING: ACT: systemd service SAPS07_00.service is not active, it will be started using systemd 2023-04-18T12:31:26.377772+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: error during start of systemd unit SAPS07_00.service! 2023-04-18T12:31:26.386765+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: SAP Instance S07-HDB00 stop failed: 2023-04-18T12:31:26.397922+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: INFO: RA ==== end action stop_clone with rc=1 (0.162.1) (3s)==== 2023-04-18T12:31:26.404212+02:00 pizbuin02 pacemaker-execd[11397]: notice: rsc_SAPHana_S07_HDB00_stop_0[14955] error output [ Error: NIECONN_BROKEN (No such file or directory), NiRawRead failed in plugin_sapfrecv() ] 2023-04-18T12:31:26.404282+02:00 pizbuin02 pacemaker-execd[11397]: notice: rsc_SAPHana_S07_HDB00_stop_0[14955] error output [ Error: NIECONN_REFUSED (Connection refused), NiRawConnect failed in plugin_fopen() ] # grep "2023-04-18T12:3.*systemd.*SAP.*service" /var/log/messages 2023-04-18T12:31:21.437203+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14272]: INFO: ACT: systemd service SAPS07_00.service is active 2023-04-18T12:31:26.359308+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: WARNING: ACT: systemd service SAPS07_00.service is not active, it will be started using systemd 2023-04-18T12:31:26.371081+02:00 pizbuin02 systemd[1]: Requested transaction contradicts existing jobs: Transaction for SAPS07_00.service/start is destructive (cryptsetup.target has 'stop' job queued, but 'start' is included in transaction). 2023-04-18T12:31:26.377772+02:00 pizbuin02 SAPHana(rsc_SAPHana_S07_HDB00)[14955]: ERROR: ACT: error during start of systemd unit SAPS07_00.service!
Resolution
In case systemd-style init is used for the HANA database, it might be desired to have the SAP instance service stopping after pacemaker at system shutdown. A drop-in file might help. Example SID is S07, instance number is 00. 1. Check the HANA database instance´s systemd service: --- # systemctl list-unit-files | grep -i sap ... # systemd-cgls -u SAP.slice ... --- 2. Create and show pacemaker service drop-in file that defines the dependency: --- # mkdir -p /etc/systemd/system/pacemaker.service.d/ # cat <<EOF >/etc/systemd/system/pacemaker.service.d/00-pacemaker.conf [Unit] Description=pacemaker needs SAP instance service Documentation=man:SAPHanaSR_basic_cluster(7) Wants=SAPS07_00.service After=SAPS07_00.service EOF # cat /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf ... --- 3. Activate and check pacemaker dependency to SAP instance service: --- # systemctl daemon-reload # systemctl show pacemaker.service | grep SAPS07_00 Wants=SAPS07_00.service resource-agents-deps.target dbus.service After=system.slice network.target corosync.service resource-agents-deps.target basic.target rsyslog.service SAPS07_00.service systemd-journald.socket sysinit.target time-sync.target dbus.service sbd.service # systemd-delta | grep pacemaker [EXTENDED] /usr/lib/systemd/system/pacemaker.service -> /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf ---
Cause
This leads to an RA stop failure and node fence.
Additional Information
See also: - Manual pages systemctl(1), systemd.unit(5), SAPHanaSR_basic_cluster(7) - Blog article https://www.suse.com/c/handover-for-the-next-round-sap-on-suse-cluster-and-systemd-native-integration/
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021046
- Creation Date: 18-Apr-2023
- Modified Date:19-Apr-2023
-
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com