Failed ETCD snapshot restoration leads the cluster into stuck "paused" state
This document (000021399) is provided subject to the disclaimer at the end of this document.
Environment
Situation
At some point, the DR process does not finish properly and hangs up indefinitely which leads the cluster into what is called a "paused" state.
This symptom can be seen by checking the
clusters.cluster.x-k8s.io
object in the fleet-default
namespace from the local (upstream) cluster.
kubectl get clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default -o yamlIn the yaml output, you should see the
.spec.paused
field being set to true.Resolution
- edit the
clusters.cluster.x-k8s.io
object in the fleet-default
namespace from the local (upstream) cluster
kubectl edit clusters.cluster.x-k8s.io <CLUSTER_NAME> -n fleet-default -o yaml- refer to the
.spec.paused
field being set to false- save the file and exit
The above steps will instruct Rancher to unpause the cluster or unblock the stuck situation to continue doing the restore process.
The recommended approach would be performing the DR process again after the edit is made.
Right after this, please refer to Rancher Manager backup and restore docs here to continue the DR process depending on the distribution in use (RKE/RKE2/K3S).
Cause
an outage that made all Control Plane nodes completely unavailable.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021399
- Creation Date: 12-Mar-2024
- Modified Date:25-Jun-2024
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com