Replace osd disk validate false during stage3 due to failed to properly cleanup the DB/WAL partitions
This document (000019749) is provided subject to the disclaimer at the end of this document.
Environment
Situation
The customer removed this disk from the osd node instead of replacing it, but the remove procedure had now failed to properly cleanup the DB/WAL partitions.
# salt-run state.orch ceph.stage.3 ... OSD-node: ID: deploy OSDs Function: module.run Name: osd.deploy Result: False Comment: Module function osd.deploy threw an exception. Exception: /usr/sbin/sgdisk -n 4:0:+500M -t 4:30CD0809-C2B2-499C-8879-2D6B78529876 /dev/sdk failed Started: 14:27:37.072998 Duration: 5606.105 ms ...
Resolution
So it turns out the device is full and that is the reason why the redeploy is failing.
To determine which one of the partitions on there are stale, please run the following command on the OSD-node:
# readlink -f /var/lib/ceph/osd/ceph-*/{block.db,block.wal} | sort
This will show which active / currently running OSDs are using which partitions on the DB/WAL disks.
Compare the above output for the sdk device with the current partitions on there. Whichever partition currently resides on the sdk device that is NOT listed with the above readlink command will be the stale partition.
This is the partition that will then need to be manually deleted using for example 'parted' tool.
Once the stale partition is removed, redeploy again, redployment should succeed now.
Cause
Status
Additional Information
Differences in replace.osd and remove.osd command :
The Salt's replace.osd and remove.osd commands are identical except that replace.osd leaves the OSD as 'destroyed' in the CRUSH Map while remove.osd removes all traces from the CRUSH Map.
See also :
https://documentation.suse.com/ses/5.5/single-html/ses-admin/#ds-osd-replace
https://documentation.suse.com/ses/5.5/single-html/ses-admin/#salt-removing-osd
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019749
- Creation Date: 22-Oct-2020
- Modified Date:27-Oct-2020
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com