Performing a `ceph orch restart mgr` results in endless restart loop

This document (000020530) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Storage 7

Situation

When performing:

  # ceph orch restart mgr

the SES mgr daemons will continually restart, in what appears a forever loop.

When running commend,

# ceph config-key dump

this will show the following output:

  ...
  "mgr/cephadm/host.node1": "{\"...
    \"scheduled_daemon_actions\": {\"mgr.node1.puuiwd\": \"restart\"}}"
  ...

(and so on for other mgr instances on other nodes)

Resolution

SES7 Workaround :

This was reported on the ceph-users ML a few weeks ago, with subject '"ceph orch restart mgr" creates manager daemon restart loop'.
Adam King's suggestion was to move the mgr instance to another host, then re-apply the config to the original hosts to get it redeployed.

In my testing, I found another workaround is to run `ceph orch daemon rm` each mgr instance one after another, and they'll automatically be redeployed but with different random IDs. The scheduled restart action never goes away, but it doesn't matter anymore, because the daemon ID changed.

Resolved with SES7.1 (ceph pacific 16.2.7-650). Upgrade the cluster to SES7.1 to resolve this issue. SEE:
https://documentation.suse.com/ses/7.1/single-html/ses-deployment/#book-storage-deployment

Cause

This is currently unresolved, and logged in SUSE Bugzilla #1194039

Status

Top Issue

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.