How to assign existing replicated pools to a device class.

This document (000019699) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 5.5

Situation

Customer needs to assign pools to a device class hdd, ssd or nvme.

Resolution

How to assign existing replicated pools to a device class.
List current assignments:
# for i in $(ceph osd pool ls); do echo $i; ceph osd pool get $i crush_rule; done

Ceph luminous introduced device classes. Common device classes are hdd, ssd or nvme. But its possible to create a custom device class if needed. example: xyz.

Documentation links:
https://documentation.suse.com/ses/5.5/single-html/ses-admin/#crush-devclasses
or
https://documentation.suse.com/en-us/ses/6/single-html/ses-admin/#crush-devclasses

Special crush rules can be created and assigned to pools to allow data on the pool to be written to a specific device classes such as hdd or ssd.

The default rule provided with ceph is the replicated_rule:

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

If the ceph cluster contains these types of storage devices, create the new crush rules with:
# ceph osd crush rule create-replicated replicated_hdd default host hdd
# ceph osd crush rule create-replicated replicated_ssd default host ssd
# ceph osd crush rule create-replicated replicated_nvme default host nvme

The newly created rule will look nearly the same. This is the hdd rule:
rule replicated_hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}

If the cluster does not contain either hdd or ssd devices, the rule creation will fail.

After the rules are created, the existing pools can be assigned to the new rules:
# ceph osd pool set $POOL_NAME crush_rule replicated_hdd
or
# ceph osd pool set $POOL_NAME crush_rule replicated_ssd
or
# ceph osd pool set $POOL_NAME crush_rule replicated_nvme

Pools that may be considered for device class "ssd" or "nvme" are any pools that need reduced latency. However, it may only be practical to assign metadata pools to the faster device class devices, such as the following default pools:
cephfs_metadata, .rgw.root, default.rgw.control, default.rgw.meta,default.rgw.log, default.rgw.buckets.index

Pools that should be considered for device class hdd:
iscsi-images, cephfs_data, default.rgw.buckets.data

The cluster will enter HEALTH_WARN and move the objects to the right place on the ssd's or assigned device class until the cluster is HEALTHY again.

Monitor with "ceph osd df tree", as osd's of device class "ssd" or "nvme" could fill up, even though there there is free space on osd's with device class "hdd". Any osd above 70% full is considered full and may not be able to handle needed backfilling if a there is a failure in the domain (default is host). Customers will need to add more osd's of desired device class to prevent osd's from filling up and marking the cluster ReadOnly.

Cause

When creating a new ceph cluster the default replicated rule will allow all assigned pools to write to all osd's, regardless of device class.

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.