ceph daemons will not start due to: Error: readlink /var/lib/containers/storage/overlay/l/CXMD7IEI4LUKBJKX5BPVGZLY3Y: no such file or directory
This document (000019888) is provided subject to the disclaimer at the end of this document.
Environment
Situation
Below are some of the symptoms at the time the cluster was started:
saltmaster:~ # ceph -s
cluster:
id: c064a3f0-de87-4721-bf4d-f44d39cee754
health: HEALTH_WARN
failed to probe daemons or devices
2 osds down
2 hosts (12 osds) down
Reduced data availability: 6 pgs inactive
Degraded data redundancy: 1664/25452 objects degraded (6.538%), 78 pgs degraded, 139 pgs undersized
services:
mon: 2 daemons, quorum mon6,mon7 (age 4m)
mgr: mon6(active, since 3h), standbys: mon7, mon8.zndnvk
mds: cephfs:1 {0=cephfs.ceph9.ucrcbl=up:active} 4 up:standby
osd: 36 osds: 24 up (since 3h), 26 in (since 3h)
data:
pools: 12 pools, 674 pgs
objects: 8.48k objects, 10 GiB
usage: 215 GiB used, 785 GiB / 1000 GiB avail
pgs: 0.890% pgs not active
1664/25452 objects degraded (6.538%)
535 active+clean
76 active+undersized+degraded
57 active+undersized
4 undersized+peered
2 undersized+degraded+peered
io:
client: 1.7 KiB/s rd, 1 op/s rd, 0 op/s wr
saltmaster:~ # ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46393 root default
-9 0.24399 host osd10
3 hdd 0.04880 osd.3 down 0 1.00000
9 hdd 0.04880 osd.9 down 0 1.00000
15 hdd 0.04880 osd.15 down 0 1.00000
21 hdd 0.04880 osd.21 down 0 1.00000
28 ssd 0.02440 osd.28 down 0 1.00000
33 ssd 0.02440 osd.33 down 1.00000 1.00000
---[cut here]---
-7 0.24399 host osd15
2 hdd 0.04880 osd.2 down 0 1.00000
8 hdd 0.04880 osd.8 down 0 1.00000
14 hdd 0.04880 osd.14 down 0 1.00000
19 hdd 0.04880 osd.19 down 0 1.00000
26 ssd 0.02440 osd.26 down 0 1.00000
34 ssd 0.02440 osd.34 down 1.00000 1.00000
saltmaster:~ # ceph orch ps | grep error
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
crash.osd10 osd10 error 16s ago 19h <unknown> registry.suse.com/ses/7/ceph/ceph:latest <unknown> <unknown>
crash.osd15 osd15 error 16s ago 19h <unknown> registry.suse.com/ses/7/ceph/ceph:latest <unknown> <unknown>
crash.mon5 mon5 error 0s ago 19h <unknown> registry.suse.com/ses/7/ceph/ceph:latest <unknown> <unknown>
mgr.mon5 mon5 error 0s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
node-exporter.osd10 osd10 error 16s ago 19h <unknown> registry.suse.com/caasp/v4.5/prometheus-node-exporter:0.18.1 <unknown> <unknown>
node-exporter.osd15 osd15 error 16s ago 19h <unknown> registry.suse.com/caasp/v4.5/prometheus-node-exporter:0.18.1 <unknown> <unknown>
node-exporter.mon5 mon5 error 0s ago 19h <unknown> registry.suse.com/caasp/v4.5/prometheus-node-exporter:0.18.1 <unknown> <unknown>
osd.14 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.15 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.19 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.2 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.21 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.26 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.28 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.3 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.33 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.34 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.8 osd15 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
osd.9 osd10 error 16s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph <unknown> <unknown>
rgw.default.default.ceph9.mxytwy ceph9 error 13s ago 18h <unknown> registry.suse.com/ses/7/ceph/ceph:latest <unknown> <unknown>
rgw.default.default.ceph9.pubxcy ceph9 error 13s ago 4h <unknown> registry.suse.com/ses/7/ceph/ceph:latest <unknown> <unknown>
mon5:~ # ceph health detail
HEALTH_WARN failed to probe daemons or devices; 1 stray daemons(s) not managed by cephadm
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
host mon8 ceph-volume inventory failed: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=mon8 registry.suse.com/ses/7/ceph/ceph:latest -c %u %g /var/lib/ceph
stat:stderr Error: readlink /var/lib/containers/storage/overlay/l/2X52XHV2MZM4L33XEWGHQJ7XNZ: no such file or directory
Traceback (most recent call last):
File "<stdin>", line 6115, in <module>
File "<stdin>", line 1299, in _infer_fsid
File "<stdin>", line 1382, in _infer_image
File "<stdin>", line 3583, in command_ceph_volume
File "<stdin>", line 1477, in make_log_dir
File "<stdin>", line 2086, in extract_uid_gid
RuntimeError: uid/gid not found
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemons(s) not managed by cephadm
stray daemon mgr.mon8.zndnvk on host mon8 not managed by cephadm
mon5:~ # cephadm shell
Inferring fsid c064a3f0-de87-4721-bf4d-f44d39cee754
Using recent ceph image registry.suse.com/ses/7/ceph/ceph:latest
Non-zero exit code 125 from /usr/bin/podman run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=registry.suse.com/ses/7/ceph/ceph:latest -e NODE_NAME=mon5 registry.suse.com/ses/7/ceph/ceph:latest -c %u %g /var/lib/ceph
stat:stderr Error: readlink /var/lib/containers/storage/overlay/l/CXMD7IEI4LUKBJKX5BPVGZLY3Y: no such file or directory
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 6114, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1322, in _infer_fsid
return func()
File "/usr/sbin/cephadm", line 1353, in _infer_config
return func()
File "/usr/sbin/cephadm", line 1381, in _infer_image
return func()
File "/usr/sbin/cephadm", line 3474, in command_shell
make_log_dir(args.fsid)
File "/usr/sbin/cephadm", line 1476, in make_log_dir
uid, gid = extract_uid_gid()
File "/usr/sbin/cephadm", line 2085, in extract_uid_gid
raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found
In all cases the common denominator was this message:
stat:stderr Error: readlink /var/lib/containers/storage/overlay/l/CXMD7IEI4LUKBJKX5BPVGZLY3Y: no such file or directory
Resolution
mon5:~ # podman image rm registry.suse.com/ses/7/ceph/ceph
Untagged: registry.suse.com/ses/7/ceph/ceph:latest
Deleted: f1a7d8e63a7eb956904027325e1924fc6d187994fce646a040f2ea8c7b2cec7d
mon5:~ # podman pull registry.suse.com/ses/7/ceph/ceph
Trying to pull registry.suse.com/ses/7/ceph/ceph...
Getting image source signatures
Copying blob 20dcc9d2116b done
Copying blob 19daf7f5570e done
Copying config f1a7d8e63a done
Writing manifest to image destination
Storing signatures
f1a7d8e63a7eb956904027325e1924fc6d187994fce646a040f2ea8c7b2cec7d
mon5:~ # podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e191ce4a16bf registry.suse.com/ses/7/ceph/ceph:latest -n mon.mon5 -f... 13 minutes ago Up 13 minutes ago ceph-c064a3f0-de87-4721-bf4d-f44d39cee754-mon.mon5
mon5:~ # cephadm shell
Inferring fsid c064a3f0-de87-4721-bf4d-f44d39cee754
Inferring config /var/lib/ceph/c064a3f0-de87-4721-bf4d-f44d39cee754/mon.mon5/config
Using recent ceph image registry.suse.com/ses/7/ceph/ceph:latest
After the container images have been pulled and validated, then restart appropriate services.
saltmaster:~ # ceph orch restart osd
saltmaster:~ # ceph orch restart mds
Use "ceph orch ps | grep error" to look for process that could be affected.
saltmaster:~ # ceph -s
cluster:
id: c064a3f0-de87-4721-bf4d-f44d39cee754
health: HEALTH_OK
services:
mon: 3 daemons, quorum mon6,mon7,mon5 (age 17m)
mgr: mon6(active, since 4h), standbys: mon7, mon8.zndnvk
mds: cephfs:1 {0=cephfs.ceph9.szvwmo=up:active} 3 up:standby
osd: 36 osds: 36 up (since 13m), 36 in (since 13m)
data:
pools: 12 pools, 674 pgs
objects: 8.48k objects, 10 GiB
usage: 310 GiB used, 1.2 TiB / 1.5 TiB avail
pgs: 674 active+clean
io:
client: 1.7 KiB/s rd, 1 op/s rd, 0 op/s wr
saltmaster:~ # ceph osd tree down
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
saltmaster:~ #
Cause
Status
Additional Information
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.suse.com/ses/7/ceph/ceph latest f1a7d8e63a7e 2 months ago 835 MB
registry.suse.com/caasp/v4.5/prometheus-server 2.18.0 848b38cc04c2 2 months ago 297 MB
registry.suse.com/caasp/v4.5/prometheus-alertmanager 0.16.2 4683615b36cb 2 months ago 193 MB
registry.suse.com/ses/7/ceph/grafana 7.0.3 8807a216c843 3 months ago 298 MB
registry.suse.com/caasp/v4.5/prometheus-node-exporter 0.18.1 a149a78bcd37 6 months ago 189 MB
It may be necessary to remove and pull other images as needed on each node.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019888
- Creation Date: 25-Feb-2021
- Modified Date:07-Apr-2021
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com