How to Recover Longhorn Volume Data from a Single Replica in RKE2
This document (000021786) is provided subject to the disclaimer at the end of this document.
Environment
Longhorn running on RKE2 nodes
Situation
When a Longhorn volume is faulted or the Longhorn control plane is experiencing issues, it might seem like the data stored on the volume is lost or inaccessible. In reality, the underlying replica data often still exists on disk and can be recovered manually. This is especially important in cases where the main priority is restoring service quickly. If the application doesn't require the original volume to be brought back online, accessing the data directly can be a faster path to recovery.
This article explains how to recover data from a Longhorn volume replica using RKE2’s static pod mechanism. This method works even when Longhorn itself is unavailable, as long as the Kubernetes cluster is still functioning. By mounting the replica data into a pod, it's possible to access the files and copy them out. This can help bring services back online by restoring data into a new volume or alternative storage solution, giving teams a reliable recovery option when Longhorn is not responding.
Resolution
Step 1: Locate the Replica Data on Disk
First, identify where Longhorn stores its replica data. Run this command to find the replica storage path:
find / -name longhorn-disk.cfg
You might see:
/var/lib/longhorn/longhorn-disk.cfg
Then list the replicas:
ls /var/lib/longhorn/replicas/
Example:
pvc-<volume-name>-<8charUUID>
pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d
Placeholder: /var/lib/longhorn/replicas/pvc-<your-volume-name>-<uuid>
This command searches your entire filesystem for the Longhorn configuration file, which indicates where replicas are stored.
Step 2: Determine the Volume Size from Metadata
To correctly mount the volume, you need its exact size. Examine the volume metadata file:
cat /var/lib/longhorn/replicas/pvc-<volume-name>-<uuid>/volume.meta
Look for the Size
field:
{"Size":10737418240, ...}
Placeholder: Size: <volume-size-in-bytes>
Example:
cat /var/lib/longhorn/replicas/pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d/volume.meta
Yields:
{"Size":10737418240, "Head":"volume-head-000.img", ...}
The Size
field contains the volume's size in bytes, which you'll need in the next step. The JSON output also includes other useful metadata about the volume structure.
Step 3: Create a Static Pod Manifest to Launch the Longhorn Engine
Now you'll create a static pod definition that RKE2 will automatically deploy. This pod will run the Longhorn engine and expose your volume as a block device:
/var/lib/rancher/rke2/agent/pod-manifests/longhorn-recovery.yaml
Template:
apiVersion: v1
kind: Pod
metadata:
name: longhorn-launch
spec:
hostPID: true
containers:
- name: engine
image: longhornio/longhorn-engine:v<version>
securityContext:
privileged: true
command: ["launch-simple-longhorn"]
args: ["<volume-name>", "<volume-size-in-bytes>"]
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: data
mountPath: /volume
volumes:
- name: dev
hostPath:
path: /dev
- name: proc
hostPath:
path: /proc
- name: data
hostPath:
path: <host-path-to-replica>
restartPolicy: Never
Example:
apiVersion: v1
kind: Pod
metadata:
name: longhorn-launch
spec:
hostPID: true
containers:
- name: engine
image: longhornio/longhorn-engine:v1.8.0
securityContext:
privileged: true
command: ["launch-simple-longhorn"]
args: ["pvc-27c076f8-5710-416f-9729-83194cad4aac", "10737418240"]
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: data
mountPath: /volume
volumes:
- name: dev
hostPath:
path: /dev
- name: proc
hostPath:
path: /proc
- name: data
hostPath:
path: /var/lib/longhorn/replicas/pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d
restartPolicy: Never
This manifest creates a privileged pod that mounts your replica data and exposes it as a standard block device. Be sure to replace the placeholders with your actual values, including:
- The correct Longhorn engine version
- Your volume name (from the replica path)
- The exact volume size in bytes (from volume.meta)
- The full path to your replica directory
Step 4: Monitor the Recovery Process Through Pod Logs
To verify the recovery process is working, check the pod logs:
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
Find the pod:
/var/lib/rancher/rke2/bin/crictl pods | grep longhorn-launch
Then tail the logs:
/var/lib/rancher/rke2/bin/crictl logs <container-id>
Once the pod is running, you should see log messages indicating that the Longhorn engine has started and the volume is available.
Step 5: Mount and Access the Recovered Volume Data
After the Longhorn engine initializes successfully, a new block device will appear on your system:
/dev/longhorn/<volume-name>
Mount this device in read-only mode to prevent any accidental data corruption:
mkdir -p /mnt/longhorn
mount -o ro /dev/longhorn/pvc-27c076f8-5710-416f-9729-83194cad4aac /mnt/longhorn
At this point, all your volume data is accessible under /mnt/longhorn
. You can use standard file operations to copy data to a safe location:
# Example: Create a backup archive
tar -czf /tmp/volume-backup.tar.gz -C /mnt/longhorn .
# Or copy specific files
cp -rp /mnt/longhorn/important-data /tmp/backup/
# Or use rsync for large data sets
rsync -av /mnt/longhorn/ /tmp/backup/
You can also create a volume in Longhorn and mount it in maintenance mode to this same node to copy the data directly to the new volume.
Step 6: Clean Up After Recovery
Once you've recovered your data, clean up the resources:
rm /var/lib/rancher/rke2/agent/pod-manifests/longhorn-recovery.yaml
After removing the manifest file, RKE2 will automatically stop the static pod, and the block device will disappear. You should also unmount the filesystem before this happens:
umount /mnt/longhorn
Best Practice: Always mount Longhorn recovery volumes as read-only (-o ro
) to prevent accidental data corruption. Any writes to an isolated replica could cause data inconsistencies if you later restore the Longhorn system.
Cause
This issue occurs when a Longhorn volume cannot be attached to a pod or node using the usual methods. This often happens because the volume is in a faulted state or because Longhorn's control components are unavailable. When this happens, users are unable to access critical data through the Kubernetes PVC system or the Longhorn UI, even though the actual replica data may still be present on disk.
Several conditions can lead to this scenario. A volume might become faulted due to failed rebuilds or persistent disk errors. The Longhorn manager or engine components may be unresponsive, making it impossible to manage or recover the volume through the UI. Nodes holding key replicas may fail or go offline, and when they return, Longhorn may be unable to reestablish quorum or restore the volume's state. In some cases, even a successful reboot or maintenance task can leave the volume unattachable if the cluster cannot reconcile its metadata.
When the data is urgently needed to restore service, waiting for Longhorn to recover is not always an option. In such cases, manually recovering the replica data and using it to bring systems back online becomes the fastest and most reliable way to restore functionality.
Additional Information
Longhorn recovery documentation: https://longhorn.io/docs/1.8.1/advanced-resources/data-recovery/
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021786
- Creation Date: 11-Apr-2025
- Modified Date:17-Apr-2025
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com