How to migrate non-LVM OSD DB volume to another disk or partition
This document (000020276) is provided subject to the disclaimer at the end of this document.
Environment
Situation
The goal is to move the OSD's RocksDB data from underlying BlueFS volume to another location, e.g. for having more space but keep using the OSD.
Resolution
1) Create a new partition of desired sized using 'parted' tool. Create GPT table with parted's 'mktable gpt' command first.
Please note the resulting partition name under /dev subfolder, e.g. /dev/vdg1.Please also note the partition uuid e.g. by looking for partition name under /dev/disk/by-partuuid:
$ ls -l /dev/disk/by-partuuid
lrwxrwxrwx 1 root root 10 Jun 4 16:45 4f02c107-73a2-42f4-9be6-e609ba2a9f45 -> ../../vdg1
2) Set the noout flag to avoid data rebalance when OSDs go down
For example, to set noout for a specific OSD.12$ ceph osd set-group noout osd.12
can be used. To set noout for the whole OSD class named 'hdd'
$ ceph osd set-group noout hdd
3) Stop the OSD in question
$ systemctl stop ceph-osd@124) Migrate bluefs data using ceph-bluestore-tool
$ ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-12 --devs-source /var/lib/ceph/osd/ceph-12/block --devs-source /var/lib/ceph/osd/ceph-12/block.db --command bluefs-bdev-migrate --dev-target --dev-target /dev/vdg1inferring bluefs devices from bluestore path
device removed:1 /var/lib/ceph/osd/ceph-12/block.db
device added: 1 /dev/vdg1
5) Update OSD's config under /etc/ceph/osd/<osd_id>_<osd_fsid>.json
E.g. the original one could look like this:$ cat /etc/ceph/osd/12-cd805c98-41ff-400a-b3af-c5783cc1ae4d.json
{
"active": "ok",
"block": {
"path": "/dev/disk/by-partuuid/974e9dec-3121-4079-b2e9-3c929fc20e45",
"uuid": "974e9dec-3121-4079-b2e9-3c929fc20e45"
},
"block.db": {
"path": "/dev/disk/by-partuuid/17ffc366-ef7a-4577-9f1e-d541e18db8f3",
"uuid": "17ffc366-ef7a-4577-9f1e-d541e18db8f3"
},
"block_uuid": "974e9dec-3121-4079-b2e9-3c929fc20e45",
"bluefs": 1,
"ceph_fsid": "4b9be45b-e09b-3c47-a00b-2b16d8151d75",
"cluster_name": "ceph",
"data": {
"path": "/dev/vdb1",
"uuid": "cd805c98-41ff-400a-b3af-c5783cc1ae4d"
},
"fsid": "cd805c98-41ff-400a-b3af-c5783cc1ae4d",
"keyring": "AQAdPrdg0oQvExAAmvGRDoRzsVCH6QA0hIWHIQ==",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"require_osd_release": "",
"systemd": "",
"type": "bluestore",
"whoami": 12
}
Replace the original partition UUIDs in "block.db" section with the new one noted at step 1), i.e. replace the following lines
"block.db": {
"path": "/dev/disk/by-partuuid/17ffc366-ef7a-4577-9f1e-d541e18db8f3",
"uuid": "17ffc366-ef7a-4577-9f1e-d541e18db8f3"
},
with
"block.db": {
"path": "/dev/disk/by-partuuid/4f02c107-73a2-42f4-9be6-e609ba2a9f45",
"uuid": "4f02c107-73a2-42f4-9be6-e609ba2a9f45"
},
6) Activate OSD with a new partition
OSD's ID and FSID to be provided as parameters:$ ceph-volume simple activate 12 cd805c98-41ff-400a-b3af-c5783cc1ae4d
Running command: /usr/bin/ln -snf /dev/vdb2 /var/lib/ceph/osd/ceph-12/block
Running command: /usr/bin/chown -R ceph:ceph /dev/vdb2
Running command: /usr/bin/ln -snf /dev/vdg1 /var/lib/ceph/osd/ceph-12/block.db
Running command: /usr/bin/chown -R ceph:ceph /dev/vdg1
Running command: /usr/bin/systemctl enable ceph-volume@simple-12-cd805c98-41ff-400a-b3af-c5783cc1ae4d
Running command: /usr/bin/ln -sf /dev/null /etc/systemd/system/ceph-disk@.service
--> All ceph-disk systemd units have been disabled to prevent OSDs getting triggered by UDEV events
Running command: /usr/bin/systemctl enable --runtime ceph-osd@12
Running command: /usr/bin/systemctl start ceph-osd@12
--> Successfully activated OSD 12 with FSID cd805c98-41ff-400a-b3af-c5783cc1ae4d
7) Restart OSD and make sure it's running
$ systemctl start ceph-osd@12$ sleep 10
$ systemctl status ceph-osd@12
ceph-osd@12.service - Ceph object storage daemon osd.12
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
Active: active (running) since Fri 2021-06-04 19:18:33 CEST; 2min 40s ago
Process: 4606 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 12 (code=exited, status=0/SUCCESS)
Main PID: 4610 (ceph-osd)
Tasks: 61
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@12.service
└─4610 /usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup ceph
Note: We recommend to reboot the relevant host for the very first upgraded OSD right after this OSD has got a new DB volume. This is necessary to make sure the provided procedure works fine for the customer's cluster. The next host reboot is recommended when every OSD at this host are upgraded. See step 8) below.
8) Reboot the host when every OSD on it is upgraded (optional)
Make sure all the OSDs are back online after the reboot and they have got new DB volume. This step is rather redundant and it's intended primarily to provide more data safety.9) Clear noout flags once migration is complete
To clear noout for a specific OSD.12, use$ ceph osd unset-group noout osd.12
To clear noout for the whole OSD class named 'hdd'
$ ceph osd unset-group noout hdd
Important Note: If any of the commands above report something unexpected/suspicious during the migrate process, stop the process immediately and consult with SUSE support engineers. Please DO NOT try to restart/recover OSD in question, neither proceed with different OSDs upgrade- Please collect broken and all the prior commands and their output. If the OSD is failing to restart (i.e. one is experiencing troubles at step 7) please also share OSD log from /var/ceph/log/ceph-osd.OSD_ID.log
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020276
- Creation Date: 09-Jun-2021
- Modified Date:11-Jun-2021
-
- SUSE Enterprise Storage
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com