cephfs, When deleting files get: rm: cannot remove 'file-name': No space left on device

This document (000020569) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 6

Situation

Users mounting cephfs file system report "No space left on device" error when trying to remove/delete files.

ses-master:~ # ceph health detail
HEALTH_WARN 1 MDSs report oversized cache
MDS_CACHE_OVERSIZED 1 MDSs report oversized cache
mds.ses-mds-2(mds.0): MDS cache is too large (79GB/31GB); 31162038 inodes in use by clients, 996594 stray files

/cases/00327226/scc_SR00327226_ses-master_220113_1249_ea9ccd74-0155-4587-b2a6-bd6a1717dc1f/ceph> cat ceph-status
#==[ Command ]======================================#
# /usr/bin/ceph --connect-timeout=5 -s
cluster:
id: 7c9dc5a7-373d-4203-ad19-1a8d24c208d0
health: HEALTH_WARN
1 MDSs report oversized cache

services:
mon: 3 daemons, quorum ses-mon-1,ses-mon-2,ses-mon-3 (age 9d)
mgr: ses-mon-2(active, since 9d), standbys: ses-mon-3, ses-mon-1
mds: cephfs:1 {0=ses-mds-2=up:active} 1 up:standby
osd: 247 osds: 247 up (since 23m), 246 in (since 9d)

task status:

data:
pools: 14 pools, 3688 pgs
objects: 538.54M objects, 744 TiB
usage: 1.1 PiB used, 1.1 PiB / 2.3 PiB avail
pgs: 2081 active+clean+snaptrim_wait
1173 active+clean
427 active+clean+snaptrim
3 active+clean+scrubbing+deep+snaptrim_wait
2 active+clean+scrubbing+deep
2 active+clean+scrubbing+snaptrim_wait

io:
client: 69 MiB/s rd, 6.1 MiB/s wr, 4.65k op/s rd, 457 op/s wr

After 30 minutes the cluster health has gone to "HEALTH_OK" but reports of "No space left on device" when deleting some files/directories continues.

Resolution

The error is observed when "num_strays" is near the default value of 1000000 (1 million), which is 10x the value of "mds_bal_fragment_size_max" setting. To view "num_strays", run the following command on the active mds node: "ceph daemon mds.`hostname -s` perf dump | grep strays"

"mds_bal_fragment_size_max" is configured with default value "100000" (100 thousand). The value of "mds_bal_fragment_size_max" can be viewed by running the following command on the mds node:
"ceph daemon mds.`hostname -s` config get mds_bal_fragment_size_max"

Example:
ses-mds-2:~ # ceph daemon mds.`hostname -s` config get mds_bal_fragment_size_max
{
"mds_bal_fragment_size_max": "100000"
}

ses-mds-2:~ # ceph daemon mds.`hostname -s` perf dump | grep strays
        "num_strays": 996646,
        "num_strays_delayed": 0,
        "num_strays_enqueuing": 0,
        "strays_created": 10101709,
        "strays_enqueued": 9532957,
        "strays_reintegrated": 555,
        "strays_migrated": 0,

Increase "mds_bal_fragment_size_max = 200000" on the mds nodes:
ceph daemon mds.$HOSTNAME config set mds_bal_fragment_size_max 200000

Then observe "ceph daemon mds.`hostname -s` perf dump | grep num_stray" to ensure the value remains under 2000000. If the value is near 2000000, increasing "mds_bal_fragment_size_max 400000"

Also add the desired configuration to the ceph.conf, as the above configure is not persistent.

Cause

Note: Customer had increased "mds cache memory limit = 34359738367"
There is a correlation between `mds_cache_memory_limit` and `mds_bal_fragment_size_max` settings. When increasing 'mds_cache_memory_limit', `mds_bal_fragment_size_max` should also be increased as well, if the "num_strays" value is reaching its limit.

Status

Top Issue

Additional Information

See:
https://docs.ceph.com/en/latest/cephfs/dirfrags/#size-thresholds
https://docs.ceph.com/en/latest/cephfs/mds-config-ref/

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.