Enable or re-enable Cephx authentication on a SUSE Enterprise Storage Cluster

This document (7018435) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Enterprise Storage 3
SUSE Enterprise Storage 4

Situation

The SUSE Enterprise Storage cluster is running without Cephx (Ceph authentication) enabled and it is needed to enable / re-enable this.

Resolution

1. Verify the current authentication information the cluster is aware of using "ceph auth list" and verify the output shows proper key information and Capabilities (caps). Below are excerpt example default key entries for the metadata server(s) (mds), Object Storage Daemons (osds) and client.* that should normally be present:

mds.ses_server
    key: AQBag6xXFRgTEhAA1LEgmlq52lMNfcOf7uL4vg==
    caps: [mds] allow
    caps: [mon] allow profile mds
    caps: [osd] allow rwx
...
osd.0
    key: AQDVXSdXDJBKDRAA1Ij5Zyh+fR2TnaGZPg/CYQ==
    caps: [mon] allow profile osd
    caps: [osd] allow *
...
client.bootstrap-mds
    key: AQAmUidXTe05EhAAE91HR//3LanzUFZypUkU8w==
    caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
    key: AQAlUidX0vBUNBAAI/39hYlFgz4Usj7sWYJlyA==
    caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgw
    key: AQAmUidXUNCtBRAALMYWGmB499amzwh7WaIjLg==
    caps: [mon] allow profile bootstrap-rgw

2. If required, update the Capabilities (caps) with something like the below command, replacing with appropriate key and desired values:

:~> ceph auth caps client.admin mon 'allow *' mds 'allow *' osd 'allow *'

NOTE: It is important when updating caps to always also again specify any already existing capabilities, running the command overwrites existing caps it does not update. For example running the above command and only specifying "mon 'allow *'" will remove the existing caps for mds and osd.

3. Verify the changes with:

:~> ceph auth get client.admin

exported keyring for client.admin

[client.admin]

       key = AQAu9VtYX4brIhAAUHnOG6vx6rujHWC7hQjZXQ==

       caps mds = "allow *"

       caps mon = "allow *"

       caps osd = "allow *"

4. If the output shows the proper expected information, edit the "ceph.conf" file in the local directory from where ceph-deploy was originally executed when the cluster was installed / configured on the admin node and un-remark / re-add the authenctication entries:

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

5. If present remark / remove:

auth cluster required = none

auth service required = none

auth client required = none

auth supported = none

NOTE: If there is a ceph.conf in the local directory from which the command below with step 6. is run, ceph-deploy will push this file to the other nodes.

6. Push the new change to all nodes (adjust node entries for the current environment):

:~> sudo ceph-deploy --overwrite-conf admin ceph_node-1 ceph_node-2 ceph_node-3 ...

7. Confirm the changes by verifying the "/etc/ceph/ceph.conf" files on some (or all) of the nodes by for example using from the admin node:

:~> ssh ceph_node-1 cat /etc/ceph/ceph.conf && ssh ceph_node-2 cat /etc/ceph/ceph.conf ...

NOTE: The above is assuming that the ssh key was copied over (for the user being used) to the cluster nodes, no authentication information should then be prompted for and only the output of the commands should be displayed.

8. To prevent re-balancing while re-starting the ceph services execute from one of the nodes:

:~> sudo ceph osd set noout

9. To restart the ceph cluster, one node at a time, restart the ceph related services by executing:

:~> sudo systemctl restart ceph.target

10. Verify the services started, for example the OSD services, with:

:~> sudo systemctl -all status ceph-osd@* | grep 'Active:'

NOTE: Once all services are running proceed and restart the services for the next node.

NOTE: If the "-all" option is not used with the command from 10. above inactive services will not be listed.

11. If for example the OSD services did not start (or not all services started) on the node check the status of each of the not running OSD services (replace XX with the relevant OSD number):

:~> systemctl status ceph-osd@XX.service

..

systemd[1]: ceph-osd@XX.service: Failed with result 'start-limit'.

...

12. If the above "start-limit" reason is listed for the failed services do the following for each of the failed OSD services:

:~> systemctl reset-failed ceph-osd@XX.service

:~> systemctl start ceph-osd@XX.service

13. The service should now start, if it STILL fails, first restart only all the MON services, if on any specific node after this step some of the OSD services still fail the only remaining option is likely to reboot the affected node.

14. Once the services on all the nodes have been restarted unset noout:

:~> sudo ceph osd unset noout

15. Finally verify cluster health with:

:~> sudo ceph -s

Cause

Cluster was running without Cephx Authentication or it was needed to re-enable authentication after previsouly disabling it.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.