How to perform a rollback of Kubernetes version in an RKE2 upstream/standalone cluster
This document (000021618) is provided subject to the disclaimer at the end of this document.
Environment
RKE2 upstream (Rancher local) or standalone cluster.
Situation
After a Kubernetes upgrade on the cluster, for whatever reason, you need to apply an etcd restore operation which involves the come back to a previous Kubernetes version.
You need the following :
- A snapshot of the cluster at a working time.
- The RKE2 version that was running at that time.
Resolution
The local/standalone cluster rollback process includes all the steps normally handled by Rancher for Rancher-provisioned clusters explained below.
Drain the pods and stop the rke2 service on all the nodes of the cluster.
This can be done using the "rke2-killall.sh" script, included in all RKE2 nodes during first setup.
Manually rolling back the RKE2 binary to the previous version.
This can be done using the RKE2 installation script, providing it with the desired RKE2 Kubernetes version to roll back.
E.g.: curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION="v1.24.6+rke2r1" sh -
Once the cluster runs again the previous Kubernetes version, the version of the snapshot you will apply, then it will be possible to manually run an etcd snapshot restore operation on the first server node:
-
rke2 server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
Once the restore process is completed, start the rke2-server service on the first server node as follows:
-
-
systemctl start rke2-server
-
To re-join the other master nodes, you will also need to clean up the old etcd database, which can be done by:
-
- Take a backup of the "config.yaml" file to recover it easily afterwards.
- Make use of the "rke2-uninstall.sh" script
Then add these master nodes back as in a fresh installation.
To do so, start the rke2-server service on other server nodes with the following command:
-
systemctl start rke2-server
For the agent nodes (the ones that had the worker role), you'll need to manually enable and start the rke2 agents on those nodes.
Cause
Differently from RKE, RKE2 is currently unable to automatically come back to a previous working state if the Kubernetes version has been upgraded.
Then, an etcd snapshot restore operation can not come back to a previous Kubernetes version, if the used snapshot was taken using a previous version of Kubernetes. It will not restore successfully, leaving the cluster in a down state.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021618
- Creation Date: 13-Nov-2024
- Modified Date:06-Feb-2025
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com