How to perform a rollback of Kubernetes version in an RKE2 upstream/standalone cluster

This document (000021618) is provided subject to the disclaimer at the end of this document.

Environment

RKE2 upstream (Rancher local) or standalone cluster.

Situation

After a Kubernetes upgrade on the cluster, for whatever reason, you need to apply an etcd restore operation which involves the come back to a previous Kubernetes version.

You need the following :

A snapshot of the cluster at a working time.
The RKE2 version that was running at that time.

Resolution

The local/standalone cluster rollback process includes all the steps normally handled by Rancher for Rancher-provisioned clusters explained below.

Drain the pods and stop the rke2 service on all the nodes of the cluster.
This can be done using the "rke2-killall.sh" script, included in all RKE2 nodes during first setup.

Manually rolling back the RKE2 binary to the previous version.
This can be done using the RKE2 installation script, providing it with the desired RKE2 Kubernetes version to roll back.

E.g.: curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION="v1.24.6+rke2r1" sh -

Once the cluster runs again the previous Kubernetes version, the version of the snapshot you will apply, then it will be possible to manually run an etcd snapshot restore operation on the first server node:

- rke2 server \ --cluster-reset \ --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>

Once the restore process is completed, start the rke2-server service on the first server node as follows:

- - systemctl start rke2-server

To re-join the other master nodes, you will also need to clean up the old etcd database, which can be done by:

- Take a backup of the "config.yaml" file to recover it easily afterwards.
- Make use of the "rke2-uninstall.sh" script

Then add these master nodes back as in a fresh installation.
To do so, start the rke2-server service on other server nodes with the following command:

- systemctl start rke2-server

For the agent nodes (the ones that had the worker role), you'll need to manually enable and start the rke2 agents on those nodes.

Cause

Differently from RKE, RKE2 is currently unable to automatically come back to a previous working state if the Kubernetes version has been upgraded.

Then, an etcd snapshot restore operation can not come back to a previous Kubernetes version, if the used snapshot was taken using a previous version of Kubernetes. It will not restore successfully, leaving the cluster in a down state.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

Document ID:000021618
Creation Date: 13-Nov-2024
Modified Date:06-Feb-2025
- SUSE Rancher

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Join Our Community

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Support FAQ

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.

Report a Software Vulnerability

Go to Customer Center

SUSE Support

Here When You Need Us