RKE2 upgrades causing dataloss to application deployed using helm chart
This document (000020726) is provided subject to the disclaimer at the end of this document.
Environment
- On RKE2 versions below 1.21.12
Situation
Resolution
Recent releases of RKE2 allow customization of the Helm job behavior to reduce the probability of data loss when deploying stateful applications. Users may:
- Set the
failurePolicy: abort
on the HelmChart spec to tell Helm to leave the release in a failed state if the upgrade does not succeed. - Set the
helmcharts.helm.cattle.io/unmanaged
annotation on the HelmChart resource to prevent the Helm controller from acting on the chart at all, so that the HelmChart resource may be removed from the cluster without triggering uninstallation of the Helm Release.
If you are currently experiencing data loss during upgrades, it may be necessary to perform a manual upgrade of the RKE2 cluster, and coordinate the upgrade with changes to the HelmChart manifests to take advantage of the new features. However, before performing upgrades, you need to ensure that the following conditions are met.
NOTE: If you are not confident in following these steps, please open a ticket with the Rancher Support team to involve the engineering team for further assistance.
- Stop the rke2-server service on all server nodes.
- Upgrade the RKE2 binary or package to the latest patch release available for your current Kubernetes minor version.
- Update the affected manifests to add the new fields as necessary to obtain the desired behavior, on any nodes where the manifests are present. If no nodes contain the manifests, pick one node to deploy the manifests and place them on disk so that they are applied immediately during system startup. Details of the fields are explained below.
- Start the rke2-server service on all server nodes.
New Fields:
helmcharts.helm.cattle.io/unmanaged
annotation on theHelmChart
Custom Resource HelmChart resources with this annotation present will not be processed by the Helm controller. Add this annotation if you plan to remove the HelmCharts resources and begin managing the application via another method.spec.failurePolicy
on theHelmChart and HelmChartConfig Custom Resource HelmCharts where the HelmChart or corresponding HelmChartConfig set the failurePolicy field to
abort will leave the Helm release in a failed state. The administrator is expected to manually assess the failure and restore the release to a functional state, using commonly available Helm CLI tools.spec.repoCA
on theHelmChart
Custom Resource. This new field allows for use of a private CA on the Helm repository. Use this when hosting charts on a server that does not have a public CA Certificate in order to avoid certificate errors when installing or upgrading the chart.
Cause
RKE2 upgrades packaged components using bundled HelmChart manifests. These resources trigger Jobs that wrap the Helm CLI tool. As all packaged components must be upgraded to ensure a functional system, if any Helm Releases are stuck in an invalid state (Failed, Pending, etc) at the time of the upgrade, those releases are uninstalled and reinstalled to reset the system to a known-good state.
If user-provided HelmChart manifests are used to deploy stateful applications where uninstallation of the Helm chart may cause data loss, this behavior may not be desired. For example, when Longhorn is deployed using a HelmChart manifest, an uninstall of the release will also delete all the Longhorn Custom Resources, potentially causing data loss. The actual volume content is not deleted, but Longhorn will lose the data mapping the content to Persistent Volumes.
Status
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000020726
- Creation Date: 16-Aug-2022
- Modified Date:06-Sep-2022
-
- SUSE Rancher Longhorn
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com