1 large objects found in pool 'default.rgw.meta'

This document (000019698) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5

Situation

The following "ceph -s" output is observed with the "1 large omap objects" warning:

# /usr/bin/ceph --connect-timeout=5 -s
cluster:
health: HEALTH_WARN
1 large omap objects
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon2(active), standbys: mon1
osd: 120 osds: 120 up, 120 in
rgw: 1 daemon active
data:
pools: 6 pools, 6400 pgs
objects: 13.10M objects, 99.3GiB
usage: 447GiB used, 152GiB / 599GiB avail
pgs: 6399 active+clean
1 active+clean+scrubbing+deep
io:
client: 336KiB/s rd, 391op/s rd, 0op/s wr

The "ceph health detail" command includes the pool information "1 large objects found in pool 'default.rgw.meta'":

# /usr/bin/ceph --connect-timeout=5 health detail
HEALTH_WARN 1 large omap objects
1 large objects found in pool 'default.rgw.meta'
Search the cluster log for 'Large omap object found' for more details.

Searching for the string in the "/var/log/ceph/ceph.log" file lists the following warnings:

# egrep -i 'Large omap object found' /var/log/ceph/ceph.log
./ceph/log/ceph/ceph.log
2020-08-26 00:15:14.726736 osd.10 osd.10 10.1.1.8:6800/354953 9 : cluster [WRN] Large omap object found. Object: 3:caf94831:users.uid::USER.NAME.buckets:head PG: 3.8c129f53 (3.3) Key count: 510003 Size (bytes): 92776321
2020-08-26 01:29:08.129042 osd.10 osd.10 10.1.1.8:6800/354953 20 : cluster [WRN] Large omap object found. Object: 3:caf94831:users.uid::USER.NAME.buckets:head PG: 3.8c129f53 (3.3) Key count: 510003 Size (bytes): 92776321

Resolution

Delete unused buckets.
Create multiple users and spread the buckets evenly across all users.

In Ceph version 12.2.13 the default value for the "osd_deep_scrub_large_omap_object_key_threshold" configuration parameter, which defines when it should start to warn about large omap objects, has been changed from 2000000 to 200000 (decreased 10 times).

Currently with the above mentioned examples, there are 510003 keys:

2020-08-26 01:29:08.129042 osd.10 osd.10 192.168.1.58:6800/354953 20 : cluster [WRN] Large omap object found. Object: 3:caf94831:users.uid::USER.NAME.buckets:head PG: 3.8c129f53 (3.3) Key count: 510003 Size (bytes): 92776321

That is why with versions prior to 12.2.13 one will not observe the warning (510003 < 2000000) but will observe it in an upgraded environment (510003 > 200000).

For reference see the Luminous Ceph release notes that mention the change (search for "Lower the default value of osd_deep_scrub_large_omap_object_key_threshold") and the PR (Pull Request) with the change.

The setting can be adjusted in the "/etc/ceph/ceph.conf" file by using the section "Adjusting ceph.conf with Custom Settings" in the SES Online documentation. For example:

Create a "/srv/salt/ceph/configuration/files/ceph.conf.d/osd.conf" file adding:

osd_deep_scrub_large_omap_object_key_threshold = 2000000

A lower value can also be acceptable:

osd_deep_scrub_large_omap_object_key_threshold = 1000000
or
osd_deep_scrub_large_omap_object_key_threshold = 750000

Apply the configuration change with the following commands:

# salt 'Admin*' state.apply ceph.configuration.create
# salt '*' state.apply ceph.configuration

Where "Admin*" is the name of the Admin/Salt Master node for the cluster. Alternatively run DeepSea stages 2, 3 and 4.

To change the value for the running OSD daemons:

# ceph tell 'osd.*' injectargs --osd_deep_scrub_large_omap_object_key_threshold=2000000

Cause

User indices are not sharded, in other words we store all the keys of names of buckets under one object. This can cause large objects to be found. The large object is only accessed in the List All Buckets S3/Swift API. Unlike bucket indices, the large object is not exactly in the object IO path. Depending on the use case for so many buckets, the warning isn't dangerous as the large object is only used for the List All Buckets API.

The error shows a user has 500K buckets causing the omap issue. Sharding does not occur at the user level. Bucket indexes are sharded but buckets per user is not (and usually the default max_bucket is 1000).

Status

Top Issue

Additional Information

One can find bucket stats using the following example commands:

# radosgw-admin bucket stats > bucket-stats.txt

The number of buckets:

# grep '"bucket":' bucket-stats.txt | wc -l
510003

Number of bucket owners (replace USER.NAME with an existing user name):

# grep '"owner":' bucket-stats.txt | uniq
"owner": "USER.NAME",

Number of buckets owned by "USER.NAME" (replace USER.NAME with an existing user name):

# grep '"owner": "USER.NAME"' bucket-stats.txt | wc -l
510003

Number of buckets containing data:

# grep '"size_actual":' bucket-stats.txt | wc -l
211159

Number of empty buckets:

# grep '"usage": {}' bucket-stats.txt | wc -l
298844

The single owner called USER.NAME in the above examples, is the owner of all buckets in the cluster. There are 500K (510003) buckets owned by this user of which only 200K (211159) are actually containing objects.

Additionally note that the "Large omap object found" warning (and a message logged to "/var/log/ceph/ceph.log") is generated when a PG (Placement Group) is deep-scrubbed and a large object is found.

If there are no messages in the current ceph.log file, search the ceph.log archive files, for example:

# xzegrep -i 'Large omap object found' /var/log/ceph/ceph*.xz

Otherwise ensure the following:

If the ceph.conf file was not updated via "Adjusting ceph.conf with Custom Settings ", then it is best that the modifications are made in the "/srv/salt/ceph/configuration/files/ceph.conf.d/global.conf" file. Make sure that the osd.conf file does not have the "osd_deep_scrub_large_omap_object_key_threshold" parameter in it, then run stage.2 and stage.3. Or alternatively run the following commands:

Create the new configuration in the global.conf file on the Salt master node then run:

# salt 'SALT_MASTER_NODE.*' state.apply ceph.configuration.create

# salt-call state.apply ceph.configuration.create

Apply the new configuration to the targeted OSD minions:

# salt 'OSD_MINIONS' state.apply ceph.configuration

It is then necessary to change the value for the running OSD daemons by running the following command on the admin node:

# ceph tell 'osd.*' injectargs --osd_deep_scrub_large_omap_object_key_threshold=2000000

Since the "1 large objects found in pool 'default.rgw.meta'" are found during deep-scrubing, the following command can be run to instruct PGs in the 'default.rgw.meta' pool to be deep-scrubbed:

# for i in $(ceph pg ls-by-pool default.rgw.meta | cut -d " " -f 1 | grep "\."); do echo $i; ceph pg deep-scrub $i; done

After ALL PGs have been deep-scrubbed repeat the grep command to locate more information about the large object:

# egrep -i 'Large omap object found' /var/log/ceph/ceph.log

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.