RKE2 Snaphots failing due to large configmap
This document (000021272) is provided subject to the disclaimer at the end of this document.
Environment
RKE2 <v1.25.15, <1.26.10, <1.27.7 and <1.28.3
Situation
At some point snapshots may start failing to complete. Viewing the logs in rke2-server.service
should show:
level=error msg="failed to save local snapshot data to configmap: ConfigMap \"rke2-etcd-snapshots\" is invalid: []: Too long: must have at most 1048576 bytes"
Resolution
This issue has been fixed in v1.28.3 and has been backported to 1.25.15, 1.26.10, and v1.27.7.
If an upgrade is not possible, the following steps can be taken to manually clean the config map:
- Save copies of local etcd snapshots to another folder as a precaution.
- Reduce the etcd snapshots retention on the downstream cluster configuration and disable S3 backups temporarily.
- Edit the 'rke2-etcd-snapshots' ConfigMap in the 'kube-system' namespace on the downstream cluster and remove the values beneath the data field:
kubectl edit ConfigMap -n kube-system rke2-etcd-snapshots
- After saving the edits above, Fleet should trigger all of the snapshots it missed.
- Change the snapshot schedule to every 5 minutes to allow it to apply its retention settings and clean up the snapshots. This will happen after waiting for the 5-minute period.
- Clean the on-demand snapshots since they are not automatically cleaned by the retention settings. To do this, delete them on each node's local filesystem. After a few minutes, Rancher will reconcile the changes, and the old on-demand snapshots will be removed from the UI.
- Re-enable S3 snapshots and verify if new snapshots are being saved there.
- Update the cluster configuration to re-enable the original cron schedule and retention settings.
Cause
https://github.com/rancher/rke2/issues/4495
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021272
- Creation Date: 15-Nov-2023
- Modified Date:18-Apr-2024
-
- SUSE Rancher
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com