Debugging why non-sbd stonith device resource agent failed.
This document (000020991) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 15 SP4
SUSE Linux Enterprise High Availability Extension 15 SP3
SUSE Linux Enterprise High Availability Extension 15 SP2
SUSE Linux Enterprise High Availability Extension 15 SP1
SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12 SP5
SUSE Linux Enterprise High Availability Extension 12 SP4
SUSE Linux Enterprise High Availability Extension 12 SP3
SUSE Linux Enterprise High Availability Extension 12 SP2
SUSE Linux Enterprise High Availability Extension 12 SP1
SUSE Linux Enterprise High Availability Extension 12
Situation
If a non-sbd based stonith resource agent fails .
Example external/ipmi type.
heynode01:~ # crm_mon --include=failures --exclude=summary,nodes,resources -1 Failed Resource Actions: * STONITH-heynode01_start_0 on heynode05 'unknown error' (1): call=55, status=Error, exitreason='', last-rc-change='Sat Feb 11 08:57:57 2023', queued=0ms, exec=43200ms
And there is need to debug the cause of failure.
Resolution
First check the resource agent configuration .
heynode01:~ # crm configure show STONITH-heynode01 primitive STONITH-heynode01 stonith:external/ipmi \ op monitor interval=300s timeout=60s on-fail=restart \ op start interval=0 timeout=60s \ op_params onfail=restart \ params pcmk_delay_base=6000 hostname=heynode01 ipaddr=172.16.81.44 userid=FENCEMEUP passwd="this!just!Fail2$" interface=lanplus \ meta target-role=Started
Check what parameters stonith uses for the resource.
Example for external/ipmi.
heynode01:~ # stonith -t external/ipmi -n hostname ipaddr userid passwd interface
Now test the stonith with verbose option enabled (-v) and just status flag (-S).
heynode01:~ # stonith -v -t external/ipmi hostname=heynode01 ipaddr=172.16.81.44 userid=FENCEMEUP passwd="this!just!Fail2$" interface=lanplus -S
Result from above command:
-bash: !just!Fail2$: event not found
The cause of the failure above is the complex password that uses ! and $ characters that bash will find problematic.
If verbose is not enough to reveal the failure cause , you can try using debug flag (-d).
heynode01:~ # stonith -d -t external/ipmi hostname=heynode01 ipaddr=172.16.81.44 userid=FENCEMEUP passwd="this!just!Fail2$" interface=lanplus -S
Cause
Stonith resource agents can not be debugged using resource trace option .
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020991
- Creation Date: 28-Feb-2023
- Modified Date:28-Feb-2023
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com