Usage of crm report for SLES HAE
This document (7007262) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 10
Situation
There was an incident within the cluster that needs to be investigated. SUSE Technical Support requested "crm report
" (formerly hb_report
) for analysis.
Resolution
crm report
below refers to the crmsh (CRM Shell) crm subcommand used to generate a cluster report.crm_report
below refers to the output file generated by crm report
. (crm_report
works as a command, but like hb_report
, it is deprecated with next generation crmsh.)
The "crm report
" utility (previously hb_report
) is an essential tool for finding challenges and issues in a SLES HAE Cluster. In a cluster context it is important to capture all log files and all configs of all cluster nodes at the time of an incident to be investigated. For this reason, other tools and ways are possible to use, but are not as efficient in a SLES HAE cluster context.
For crm report
to work as intended, it should gather all information from all nodes. See 'Additional information' section for details.
Before uploading your data to Support, always double-check that the final "crm_report" file contains subdirectories with cluster node names and their respective data.
Collecting cluster report
If the SSH connection between your nodes is already configured and working (see 'Additional information' if the connection needs to be configured), you may run the crm report
command on the cluster node of your choice. Running it only on one node is sufficient, because crm report
will collect all the logs and cluster configurations from all the cluster members.
crm report
is also able to extract cluster history data from eg. rotated logs, that is it will inspect for example rotated /var/log/pacemaker/pacemaker.log-<date>.xz files and if logged events match defined time range, and it will copy the logged events into final crm_report
file. Of course, if the rotated logs were already removed, it cannot collect any logged events.
Example 1: Collecting cluster report as root user
This example shows collection of the report as root user. Assuming there was an incident to investigate on 14.10.2024 16:45, the interesting data would be from this time and from some time above and before, to ensure we capture all information that might have led to this incident.
The timeframe in question could be from 14.10.2024 00:00 to 14.10.2024 23:59. It is also often helpful to force the resulting output filename to contain both the date and time it was generated.
The following is an example of such parameters on an crm report
:
# crm report -f "2024/10/14 00:00" -t "2024/10/14 23:59" /tmp/crm_report-$(date +"%Y%m%d-%H%M")
With the syntax above, the resulting file is created with name in this format:
drwxrwsr-x+ 4 sfsc-dlm suse 14 Oct 14 13:09 crm_report-20241014-1348
Example 2: Collecting cluster report example as non-root user with sudo
To collect the report as non-root user with sudo
(see sudo
configuration in ‘Additional information’) add ‘-u <non-root user>’ option. An example:
sudo crm report -f "2024/10/14 00:00" -t "2024/10/14 23:59" -u sadmin1
An example to double-check the content of crm_report
file:
# ls crm_report* crm_report-Mon-14-Oct-2024.tar.bz2
Checking whether pacemaker.log was collected for the time in question:
# tar --wildcards -xOjf ./crm_report-Mon-14-Oct-2024.tar.bz2 crm_report-Mon-Oct-2024/*/pacemaker.log | sed '1b;$b;d' Oct 14 00:05:55 oldhanaa1 pacemaker-controld [31997] (crm_timer_popped) info: Cluster Recheck Timer (I_PE_CALC) just popped (900000ms) Oct 14 23:50:56 oldhanaa1 pacemaker-controld [31997] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Checking whether ha-log.txt (ha-log.txt is the same as the messages file) was collected for the time in question:
tar --wildcards -xOjf ./crm_report-Mon-14-Oct-2024.tar.bz2 crm_report-Mon-14-Oct-2024/*/ha-log.txt | sed '1b;$b;d' 2024-10-14T00:00:01.007277+02:00 oldhanaa2 CRON[10297]: (root) CMD ([ -x /usr/lib64/sa/sa1 ] && exec /usr/lib64/sa/sa1 1 1) 2024-10-14T23:55:01.691802+02:00 oldhanaa2 CRON[21550]: (root) CMD ([ -x /usr/lib64/sa/sa2 ] && exec /usr/lib64/sa/sa2 -A)
In the above output, we can see that we have ha-log.txt logged events only from one node, that is we are missing some data needed from other node for the analysis; thus a sysadmin has to restore messages file from the missing node for the time in question manually from a backup and provide it separately.
Additional Information
If SSH connections between the cluster nodes which crm report
uses has not yet been configured, there are two options to do this. Depending on your environment requirements, crm report
tries to either gather data from other nodes either via SSH root access to other cluster nodes, or via defined user via sudo (see 'Running cluster reports without root access ' for details).
Configuration to collect cluster report as root with root SSH access between cluster nodes
Root SSH access between cluster nodes is configured by default if ha-cluster-bootstrap package was used for initial cluster deployment, or if YaST cluster module was used.
If the cluster was setup manually or if SSH root access was removed or not working, it is best to setup SSH keys without password to enable the script to traverse the cluster without a sysadmin giving the root password three to four times, that is, for each and every node of the cluster.
To setup SSH keys (we use RSA in this example) for this, the command to run as user root is:
ssh-keygen -t rsa
which will create the following two keys in the /root/.ssh/ directory:
id_rsa id_rsa.pub
The public key has to be copied over to all remaining cluster nodes:
ssh-copy-id OTHER_NODE
and add this SSH key public part into local authorized keys file so crm report
from other nodes would also work:
cat /tmp/id_rsa.pub >> /root/.ssh/authorized_keys
After this it is possible for root to ssh without password from one server to another. This should be done for each and every member of the cluster.
If root SSH access is too benevolent for your needs, either try running cluster reports gathering without root (as described below) or try to see sshd_config(5) man page for 'Match' block which could be used to restrict access for a particular user.
Configuration to collect cluster report without root user
General documentation for collecting cluster report without root user is available at 'Running cluster reports without root access'.
This option uses SSH agent forwarding and sudo. SSH agent forwarding allows connections from an authentication agent (such as ssh-agent(1)), meaning the use of a sysadmin's local SSH keys to login to a final node via a jumphost (in this case the jumphost is the cluster node where cluster report is collected and final node would be any remaining cluster node).
An example of sudoers(5) definition (in this case a user in ‘sysadmin’ user group who has access to all cluster nodes via SSH with his own SSH key, needs to collect cluster report as non-root user):
Host_Alias CLUSTER = node1, node2 Runas_Alias R = root Defaults!HA_ALLOWED env_keep+=SSH_AUTH_SOCK Cmnd_Alias HA_ALLOWED = /usr/sbin/crm_report *, /usr/sbin/crm report * %sysadmins CLUSTER = (R) NOPASSWD: HA_ALLOWED
This sudo(8) definition needs to be present on all cluster nodes; it allows the user to preserve SSH_AUTH_SOCK environment variable (which points to UNIX socket used by SSH to obtain the keys from the SSH agent) while running crm report as root via sudo
.
The user wanting to collect cluster report without root account must ensure that SSH forwarding of connections from an authentication agent such as ssh-agent(1) is enabled, eg. with OpenSSH client 'ssh -A', with PuTTY ‘Allow agent forwarding’, is used while connecting to the node where cluster report collection will be run.
- https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/app-crmreport-nonroot.html
- https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/app-ha-troubleshooting.html#sec-ha-troubleshooting-misc
- https://www.suse.com/support/kb/doc/?id=000020662
What does the "hb" in the legacy hb_report
utility stand for?
The Heartbeat project, on which the original Pacemaker cluster stack was developed. Remnants of this legacy project name can still be found in Pacemaker with the "ocf::heartbeat
" resource agents.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7007262
- Creation Date: 25-Nov-2010
- Modified Date:29-Oct-2024
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com