How to assign existing replicated pools to a device class.
This document (000019699) is provided subject to the disclaimer at the end of this document.
Environment
Situation
Resolution
List current assignments:
# for i in $(ceph osd pool ls); do echo $i; ceph osd pool get $i crush_rule; done
Ceph luminous introduced device classes. Common device classes are hdd, ssd or nvme. But its possible to create a custom device class if needed. example: xyz.
Documentation links:
https://documentation.suse.com/ses/5.5/single-html/ses-admin/#crush-devclasses
or
https://documentation.suse.com/en-us/ses/6/single-html/ses-admin/#crush-devclasses
Special crush rules can be created and assigned to pools to allow data on the pool to be written to a specific device classes such as hdd or ssd.
The default rule provided with ceph is the replicated_rule:
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
If the ceph cluster contains these types of storage devices, create the new crush rules with:
# ceph osd crush rule create-replicated replicated_hdd default host hdd
# ceph osd crush rule create-replicated replicated_ssd default host ssd
# ceph osd crush rule create-replicated replicated_nvme default host nvme
The newly created rule will look nearly the same. This is the hdd rule:
rule replicated_hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
If the cluster does not contain either hdd or ssd devices, the rule creation will fail.
After the rules are created, the existing pools can be assigned to the new rules:
# ceph osd pool set $POOL_NAME crush_rule replicated_hdd
or
# ceph osd pool set $POOL_NAME crush_rule replicated_ssd
or
# ceph osd pool set $POOL_NAME crush_rule replicated_nvme
Pools that may be considered for device class "ssd" or "nvme" are any pools that need reduced latency. However, it may only be practical to assign metadata pools to the faster device class devices, such as the following default pools:
cephfs_metadata, .rgw.root, default.rgw.control, default.rgw.meta,default.rgw.log, default.rgw.buckets.index
Pools that should be considered for device class hdd:
iscsi-images, cephfs_data, default.rgw.buckets.data
The cluster will enter HEALTH_WARN and move the objects to the right place on the ssd's or assigned device class until the cluster is HEALTHY again.
Monitor with "ceph osd df tree", as osd's of device class "ssd" or "nvme" could fill up, even though there there is free space on osd's with device class "hdd". Any osd above 70% full is considered full and may not be able to handle needed backfilling if a there is a failure in the domain (default is host). Customers will need to add more osd's of desired device class to prevent osd's from filling up and marking the cluster ReadOnly.
Cause
Status
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019699
- Creation Date: 10-Sep-2020
- Modified Date:10-Sep-2020
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com