When deploying additional new OSD hosts into an existing cluster, running DeepSea stage 3 fails due to "processes.wait"
This document (000019911) is provided subject to the disclaimer at the end of this document.
Environment
Dell PowerEdge R740xd Servers
Situation
Total run time: 275.271 ms
new-storage-node.my.company.ex:
----------
ID: wait for osd processes
Function: module.run
Name: cephprocesses.wait
Result: False
Comment: Module function cephprocesses.wait executed
Started: 13:56:37.095587
Duration: 932504.842 ms
Changes:
----------
ret:
False
Summary for new-storage-node.my.company.ex
------------
Succeeded: 0 (changed=1)
Failed: 1
In the "/var/log/salt/minion" log on the OSD host(s), errors similar to the following are seen, excerpt:
stderr: Error reading device /dev/sde at 0 length 4096.
...
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.123 --yes-i-really-mean-it
...
stderr: purged osd.123
--> RuntimeError: command returned non-zero exit status: 5
Resolution
Cause
Additional Information
Additionally the following errors may also be logged / seen in "/var/log/messages" of the intended new OSD hosts / servers:
2021-03-16T14:04:41.396125+01:00 new-storage-node kernel: [515288.323471] sd 0:0:1:0: [sde] tag#4 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_SENSE
2021-03-16T14:04:41.396127+01:00 new-storage-node kernel: [515288.323476] sd 0:0:1:0: [sde] tag#4 Sense Key : Illegal Request [current]
2021-03-16T14:04:41.396129+01:00 new-storage-node kernel: [515288.323481] sd 0:0:1:0: [sde] tag#4 Add. Sense: Logical block reference tag check failed
2021-03-16T14:04:41.396130+01:00 new-storage-node kernel: [515288.323485] sd 0:0:1:0: [sde] tag#4 CDB: Read(32)
2021-03-16T14:04:41.396132+01:00 new-storage-node kernel: [515288.323489] sd 0:0:1:0: [sde] tag#4 CDB[00]: 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 06
2021-03-16T14:04:41.396134+01:00 new-storage-node kernel: [515288.323492] sd 0:0:1:0: [sde] tag#4 CDB[10]: 3c bf ff 80 3c bf ff 80 00 00 00 00 00 00 00 08
2021-03-16T14:04:41.396173+01:00 new-storage-node kernel: [515288.323496] print_req_error: protection error, dev sde, sector 26789019520
...
2021-03-16T14:04:41.250174+01:00 new-storage-node kernel: [515755.159101] print_req_error: 400 callbacks suppressed
2021-03-16T14:04:41.250175+01:00 new-storage-node kernel: [515755.159106] print_req_error: protection error, dev sde, sector 0
...
2021-03-16T14:05:41.012024+01:00 new-storage-node kernel: [515756.919719] Buffer I/O error on dev sde, logical block 3348627440, async page read
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019911
- Creation Date: 16-Mar-2021
- Modified Date:16-Mar-2021
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com