ceph -s reports: 1 clients failing to respond to capability release, 1 clients failing to advance oldest client/flush tid, 1 MDSs report slow requests
This document (000019628) is provided subject to the disclaimer at the end of this document.
Environment
Situation
1 clients failing to respond to capability release
1 clients failing to advance oldest client/flush tid
1 MDSs report slow requests
Resolution
1 clients failing to respond to capability release
1 clients failing to advance oldest client/flush tid
1 MDSs report slow requests
Cause
Additional Information
ses-master:~ # ceph -s cluster: id: 7c9dc5a7-373d-4203-ad19-1a8d24c208d0 health: HEALTH_WARN 1 clients failing to respond to capability release 1 clients failing to advance oldest client/flush tid 1 MDSs report slow requests 54 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum ses-mon-1,ses-mon-2,ses-mon-3 (age 13d) mgr: ses-mon-3(active, since 13d), standbys: ses-mon-1, ses-mon-2 mds: cephfs:1 {0=ses-mds-1=up:active} 1 up:standby osd: 206 osds: 206 up (since 3d), 206 in (since 6d) data: pools: 10 pools, 3016 pgs objects: 129.47M objects, 410 TiB usage: 618 TiB used, 1.2 PiB / 1.9 PiB avail pgs: 3008 active+clean 8 active+clean+scrubbing+deep io: client: 0 B/s rd, 44 MiB/s wr, 2 op/s rd, 50 op/s wr #==[ Command ]======================================# # /usr/bin/ceph --connect-timeout=5 health detail HEALTH_WARN 1 clients failing to respond to capability release; 1 clients failing to advance oldest client/flush tid; 1 MDSs report slow requests; 74 pgs not deep-scrubbed in time MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdsses-mds-1(mds.0): Client cephfs-client1 failing to respond to capability release client_id: 15004271 MDS_CLIENT_OLDEST_TID 1 clients failing to advance oldest client/flush tid mdsses-mds-1(mds.0): Client cephfs-client2 failing to advance its oldest client/flush tid. client_id: 13400693 MDS_SLOW_REQUEST 1 MDSs report slow requests mdsses-mds-1(mds.0): 4 slow requests are blocked > 30 secs
The client machines are responding fine except that cephfs-client2 is stuck on that one directory.
The du command is still running... The du command was killed and the process released fine.
Socket errors were observed days before the event.
cephfs-client1 shows the following in dmesg:
[Thu May 7 09:53:42 2020] libceph: osd205 up [Thu May 7 15:54:18 2020] libceph: mon2 172.21.99.206:6789 session established [Thu May 7 15:54:18 2020] libceph: client15004271 fsid 7c9dc5a7-373d-4203-ad19-1a8d24c208d0 [Fri May 8 09:35:02 2020] libceph: osd17 weight 0xd999 [Fri May 8 09:35:02 2020] libceph: osd17 weight 0x10000 (in) [Sat May 9 09:30:42 2020] libceph: osd172 down [Sat May 9 09:30:50 2020] libceph: osd172 up [Sat May 9 14:35:21 2020] libceph: osd71 172.20.09.214:6832 socket error on write [Sat May 9 14:35:21 2020] libceph: osd169 172.20.09.209:6876 socket closed (con state OPEN) [Sat May 9 14:35:21 2020] libceph: osd42 172.20.09.211:6848 socket error on write [Sat May 9 14:35:21 2020] libceph: osd25 172.20.09.208:6812 socket error on write
cephfs-client2 shows the following in dmesg :
May 7 00:06:56 cephfs-client2 kernel: libceph: osd33 172.20.09.208:6816 socket error on write May 7 09:57:52 cephfs-client2 kernel: libceph: osd166 172.20.09.209:6864 socket closed (con state OPEN) May 7 10:01:51 cephfs-client2 kernel: libceph: osd35 172.20.09.207:6820 socket closed (con state OPEN) May 7 10:01:52 cephfs-client2 kernel: libceph: osd24 172.20.09.207:6812 socket closed (con state OPEN) May 7 10:01:53 cephfs-client2 kernel: libceph: osd0 172.20.09.207:6800 socket error on write May 7 10:01:55 cephfs-client2 kernel: libceph: osd157 172.20.09.207:6864 socket closed (con state OPEN) May 7 16:14:13 cephfs-client2 kernel: libceph: osd17 weight 0xd999 May 7 17:02:50 cephfs-client2 kernel: libceph: osd17 weight 0x10000 (in) May 9 09:31:57 cephfs-client2 kernel: libceph: osd172 down May 9 09:32:02 cephfs-client2 kernel: libceph: osd172 up May 9 14:36:35 cephfs-client2 kernel: libceph: osd157 172.20.09.207:6864 socket error on write May 9 14:36:50 cephfs-client2 kernel: libceph: osd27 172.20.09.215:6814 socket error on write May 9 17:50:29 cephfs-client2 kernel: libceph: osd79 172.20.09.214:6836 socket error on write May 9 19:41:56 cephfs-client2 kernel: libceph: osd160 172.20.09.207:6872 socket closed (con state OPEN) May 9 21:16:17 cephfs-client2 kernel: libceph: osd110 172.20.09.209:6828 socket closed (con state OPEN) May 10 05:49:36 cephfs-client2 kernel: libceph: osd174 172.20.09.210:6872 socket error on write May 10 06:52:36 cephfs-client2 kernel: libceph: osd160 172.20.09.207:6872 socket closed (con state OPEN) May 10 07:42:18 cephfs-client2 kernel: libceph: osd174 172.20.09.210:6872 socket closed (con state OPEN) May 10 08:51:48 cephfs-client2 kernel: libceph: osd18 172.20.09.215:6842 socket closed (con state OPEN) May 10 13:54:32 cephfs-client2 kernel: libceph: osd91 172.20.09.215:6808 socket closed (con state OPEN) May 10 17:05:47 cephfs-client2 kernel: libceph: osd33 172.20.09.208:6816 socket closed (con state OPEN) May 10 18:32:18 cephfs-client2 kernel: libceph: osd174 172.20.09.210:6872 socket closed (con state OPEN) May 11 06:06:41 cephfs-client2 kernel: libceph: osd114 172.20.09.209:6840 socket closed (con state OPEN)
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019628
- Creation Date: 15-May-2020
- Modified Date:08-Jun-2022
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com