Logs about an incident on a Pacemaker Cluster are lost because of the log file turn-over policy
This document (000020390) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server for SAP Applications 15
SUSE Linux Enterprise Server for SAP Applications 12
Situation
Resolution
And also add in your incident procedures and/or disaster recovery procedures, the creation of the cluster hb_report immediately after experiencing a problem or an incident with the cluster, the hb_report will collect the pengine log files and other information about the cluster. Please do not let to pass many hours or days between the incident and the creation of the hb_report.
For the creation of the cluster hb_report, check the TID 000017501
Cause
For the SAP HANA related use cases, especially for the Scale-Out scenario, the cluster uses the attribute to store information about the HANA, it updates every few seconds and writes an entry in the logs, this leads to a very fast turn-over of the logs.
Additional Information
The following options specify how many pe* files should be kept:
pe-error-series-max=
pe-warn-series-max=
pe-input-series-max=
They are to be added to the "property cib-bootstrap-options:" section of the Cluster Information Base (cib) using the "crm configure edit" command.
Allowed are integer values, the value "-1" will store files unlimited and probably create an out of disk space condition at some point. When looking for values, please monitor how many files are written per day and calculate how many files should be kept for how long. This is an example from a test system:
hana01:/var/lib/pacemaker/pengine # ls -l * | grep "Sep 30"| wc -l
3511
This is the total value of files created on a given day, including input, warn and error files.
The majority of the files created will be state changes (pe-input*bz2), if the goal is to keep files for e.g. two weeks, the following values might be considered:
pe-error-series-max="-1"
pe-warn-series-max="5000"
pe-input-series-max="50000"
Once the maximum number has been reached, pacemaker will start to overwrite existing log files. If /var/lib/pacemaker/pengine is backed up every day, the numbers might be adjusted in a way, that the backup contains the most recent changes only.
For more information see:
SLE 12 SP5 based pacemaker deployment:
https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_available_cluster_options.html
SLE 15 based pacemaker deployment:
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/options.html#cluster-options
man 7 pacemaker-schedulerd
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020390
- Creation Date: 17-Sep-2021
- Modified Date:04-Oct-2021
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com