SAP Instances failed stop on shutdown (PACEMAKER, SYSTEMD, SAP)
This document (7022671) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 12
Situation
This document was supposed to provide a workaround for issues between systemd and a LSB standard init script. This will be completely obsolete in the future as a working systemd service for SAP Instances will be available for SLES 15.
In a Linux with systemd any application call with
su - <somenameofsomeuser>
will result in a move into a user slice. This is especially true in case of SAP Instances and Databases handled by the pacemaker cluster service.
The resource agent will invoke a command with
su - <SID>adm
and this will lead to a user.slice visible in systemd-cgls that looks like
| `-user-1003.slice
| |-session-c24.scope
| | |-7148 /usr/sap/HA1/ASCS00/exe/sapstartsrv pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as -D -u ha1adm
| | |-7311 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
| | |-7321 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
| | |-7336 ms.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
| | `-7337 en.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
which in general poses no problem or issue.
There might be an issue if an Administrator forgets about the Cluster and the resources and issues a
shutdown
or
reboot
on the system. As the systemd will only allow for 90 seconds for user slices and there might be other dependencies, these kind of system shutdown or reboot frequently leads to a "failed stop" of some SAP Resource. Or the killing of a big SAP Resource.
One indication of this would be, the Admin issues a "shutdown" but the machine unexpectedly "reboots". If one would look into the log files one might actually see that the node was fenced by the cluster because of a "failed stop".
Please keep in mind, that this is an issue that stems from the Admin committing this command without taking his system into consideration. With a proper sequence like for example
systemctl stop pacemaker
shutdown -h now
one will never encounter this issue.
Resolution
su - <somenameofsomeuser>
this can be bypassed by modifying the su part of the pam stack
add a new su-session file
cp /etc/pam.d/common-session /etc/pam.d/su-session
common-session normally looks like
session required pam_limits.so
session required pam_unix.so try_first_pass
session optional pam_umask.so
session optional pam_systemd.so
session optional pam_gnome_keyring.so auto_start only_if=gdm,gdm-password,lxdm,lightdm
session optional pam_env.so
add a line in the su-session with
session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/SAPUsers
making it into
session required pam_limits.so
session required pam_unix.so try_first_pass
session optional pam_umask.so
session [success=1 new_authtok_reqd=ok default=ignore] pam_listfile.so item=user sense=allow file=/etc/SAPUsers
session optional pam_systemd.so
session optional pam_gnome_keyring.so auto_start only_if=gdm,gdm-password,lxdm,lightdm
session optional pam_env.so
and adding a file
/etc/SAPUsers
which contains the names of the SAP Admin Users, in this example, SID is HA1 these would be
ardmore:~ # cat /etc/SAPUsers
ha1adm
sapadm
with the su-session file in place, modify the
/etc/pam.d/su
from the default
#%PAM-1.0
auth sufficient pam_rootok.so
auth include common-auth
account sufficient pam_rootok.so
account include common-account
password include common-password
session include common-session
session optional pam_xauth.so
to
#%PAM-1.0
auth sufficient pam_rootok.so
auth include common-auth
account sufficient pam_rootok.so
account include common-account
password include common-password
session include su-session
session optional pam_xauth.so
This new entry works as follows. During a session setup called by su it will check whether the Username provided is in the file /etc/SAPUsers and if yes it will do "success=1", meaning it will
SKIP
the next one (ONE , 1=one) line in the pam stack, which is in this case pam_systemd.so and by this bypassing the user.slice creation of systemd
The result is that the SAP Instance will now stay in the pacemaker system.slice
`-system.slice
|-pacemaker.service
| |-2196 /usr/sbin/pacemakerd -f
| |-2198 /usr/lib64/pacemaker/cib
| |-2199 /usr/lib64/pacemaker/stonithd
| |-2200 /usr/lib64/pacemaker/lrmd
| |-2201 /usr/lib64/pacemaker/attrd
| |-2202 /usr/lib64/pacemaker/pengine
| |-2203 /usr/lib64/pacemaker/crmd
| |-4125 /usr/sap/HA1/ASCS00/exe/sapstartsrv pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as -D -u ha1adm
| |-4296 sapstart pf=/sapmnt/HA1/profile/HA1_ASCS00_sapro0as
| |-4311 ms.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
| `-4312 en.sapHA1_ASCS00 pf=/usr/sap/HA1/SYS/profile/HA1_ASCS00_sapro0as
There is no need to worry about the security of the /etc/SAPUsers file as it is only a list.
Generally this approach could be used on any non-systemd aware Resource inside or outside of the cluster.
Please keep in mind that there is a TasksMax Limit for Slices, which could be hit in case too many applications end up in the pacemaker system.slice, so increasing
DefaultTasksMax=
in /etc/systemd/system.conf might be advisable.
Cause
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7022671
- Creation Date: 20-Feb-2018
- Modified Date:31-Mar-2022
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com