OSD gets restarted automatically after its ceph-osd process terminates with an assert failure
This document (000020266) is provided subject to the disclaimer at the end of this document.
Environment
Situation
Resolution
This behavior is intentional (see below for explanation.) To disable restarts altogether, create a drop-in directory for the ceph-osd service with " mkdir /etc/systemd/system/ceph-osd@.service.d/" and add a file "10-no-restart.conf" with the content
[Service] Restart=no to it. Next, reload the systemd configuration with "systemctl daemon-reload" to notify systemd about the changed unit settings.
Cause
Additional Information
However, in some situations like a failing disk causing the ceph-osd to terminate via an assert() failure, it may be beneficial to not restart the process so the downed OSD shows up in the cluster status.
The criteria which behavior makes more sense depends on system administration preferences and the specific monitoring setup. The recommendation is to monitor disk health via its SMART status, but for some disk models, the SMART status does not indicate a degrading disk reliably. In that case, a slowly failing disk may go unnoticed for a long time before it finally breaks completely.
For more information on systemd services and their configuration parameters, please refer to https://www.freedesktop.org/software/systemd/man/systemd.service.html
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020266
- Creation Date: 01-Jun-2021
- Modified Date:01-Jun-2021
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com