Downstream clusters in unavailable state after upgrade from Rancher v2.5 at v2.5.16 or above to Rancher v2.6 below v2.6.7
This document (000020910) is provided subject to the disclaimer at the end of this document.
Environment
Situation
Rancher Pod logs contain error messages of the following format:
2022/12/17 08:47:18 [ERROR] error syncing 'c-ayhjd': handler cluster-deploy: cluster context c-ayhjd is unavaiblable, requeuing 2022/12/17 08:47:25 [ERROR] error syncing '_all_': handler user-controllers-controller: failed to start user controllers for cluster c-ayhjd: ClusterUnavailable 503: cluster not found
Resolution
for cluster in $(kubectl get clusters.management.cattle.io --field-selector metadata.name!=local -o custom-columns=NAME:.metadata.name --no-headers); do echo $cluster; kubectl patch -v=9 cluster.management.cattle.io $cluster --type=merge -p "{\"status\":{\"serviceAccountToken\":\"`kubectl -n cattle-global-data get secret -o jsonpath=\"{.items[?(@.metadata.ownerReferences[0].name==\\"$cluster\\")].data.credential}\"|base64 -d`\"}}"; doneNext, edit the cluster.management.cattle.io resource in the Rancher local cluster, for each downstream cluster, to set the status of the ServiceAccountMigrated condition from True to Unknown. This action is taken to ensure that on upgrade to Rancher v2.6.7+ the secretAccountToken field is again removed and migrated to a secret. With a kubeconfig sourced for the Rancher local cluster, get the cluster IDs for all downstream clusters:
kubectl get clusters.management.cattle.io --field-selector metadata.name!=local -o custom-columns=NAME:.metadata.name --no-headersOne at a time for each cluster ID listed execute `kubectl edit cluster.management.cattle.io <cluster-id>` locate the condition with the type ServiceAccountMigrated in the status.conditions array, and update the status from "True" to "Unknown" per the following example:
[...] - lastUpdateTime: "2023-01-04T12:11:57Z" status: "True" type: Updated - lastUpdateTime: "2023-01-04T12:11:51Z" status: "Unknown" type: ServiceAccountMigrated - lastUpdateTime: "2023-01-04T12:11:57Z" status: "True" type: GlobalAdminsSynced - lastUpdateTime: "2023-01-04T12:17:40Z" [...]Finally, take a copy of the service account token secrets and then remove these, as they are no longer used and fresh secrets will be created upon upgrade to Rancher v2.6.7+.
With a kubeconfig for the Rancher local cluster sourced, first take a copy of the service account token secret manifests, with tthe following bash one-liner:
for secret in `kubectl -n cattle-global-data get secrets -o name | grep "cluster-serviceaccounttoken-"`; do kubectl -n cattle-global-data get $secret -o yaml >> cluster-serviceaccounttoken-secrets.yaml; echo "---" >> cluster-serviceaccounttoken-secrets.yaml; doneThen with the Rancher local cluster kubeconfig still sourced, delete the secrets:
for secret in `kubectl -n cattle-global-data get secrets -o name | grep "cluster-serviceaccounttoken-"`; do kubectl -n cattle-global-data delete $secret; done
Cause
As a result, where a Rancher environment is upgraded from Rancher v2.5 at v2.5.16 or above (containing the fix), to Rancher v2.6 below patch release v2.6.7 (which does not contain the fix), the status.serviceAccountToken field will be missing from the cluster.management.cattle.io resource and Rancher will be unable to connect to existing downstream clusters.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020910
- Creation Date: 04-Jan-2023
- Modified Date:05-Jan-2023
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com