How to Recover Longhorn Volume Data from a Single Replica in RKE2

This document (000021786) is provided subject to the disclaimer at the end of this document.

Environment

Longhorn running on RKE2 nodes

Situation

When a Longhorn volume is faulted or the Longhorn control plane is experiencing issues, it might seem like the data stored on the volume is lost or inaccessible. In reality, the underlying replica data often still exists on disk and can be recovered manually. This is especially important in cases where the main priority is restoring service quickly. If the application doesn't require the original volume to be brought back online, accessing the data directly can be a faster path to recovery.

This article explains how to recover data from a Longhorn volume replica using RKE2’s static pod mechanism. This method works even when Longhorn itself is unavailable, as long as the Kubernetes cluster is still functioning. By mounting the replica data into a pod, it's possible to access the files and copy them out. This can help bring services back online by restoring data into a new volume or alternative storage solution, giving teams a reliable recovery option when Longhorn is not responding.

Resolution

Step 1: Locate the Replica Data on Disk

First, identify where Longhorn stores its replica data. Run this command to find the replica storage path:

find / -name longhorn-disk.cfg

You might see:

/var/lib/longhorn/longhorn-disk.cfg

Then list the replicas:

ls /var/lib/longhorn/replicas/

Example:

pvc-<volume-name>-<8charUUID>
pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d

Placeholder: /var/lib/longhorn/replicas/pvc-<your-volume-name>-<uuid>

This command searches your entire filesystem for the Longhorn configuration file, which indicates where replicas are stored.

Step 2: Determine the Volume Size from Metadata

To correctly mount the volume, you need its exact size. Examine the volume metadata file:

cat /var/lib/longhorn/replicas/pvc-<volume-name>-<uuid>/volume.meta

Look for the Size field:

{"Size":10737418240, ...}

Placeholder: Size: <volume-size-in-bytes>

Example:

cat /var/lib/longhorn/replicas/pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d/volume.meta

Yields:

{"Size":10737418240, "Head":"volume-head-000.img", ...}

The Size field contains the volume's size in bytes, which you'll need in the next step. The JSON output also includes other useful metadata about the volume structure.

Step 3: Create a Static Pod Manifest to Launch the Longhorn Engine

Now you'll create a static pod definition that RKE2 will automatically deploy. This pod will run the Longhorn engine and expose your volume as a block device:

/var/lib/rancher/rke2/agent/pod-manifests/longhorn-recovery.yaml

Template:

apiVersion: v1
kind: Pod
metadata:
  name: longhorn-launch
spec:
  hostPID: true
  containers:
  - name: engine
    image: longhornio/longhorn-engine:v<version>
    securityContext:
      privileged: true
    command: ["launch-simple-longhorn"]
    args: ["<volume-name>", "<volume-size-in-bytes>"]
    volumeMounts:
    - name: dev
      mountPath: /host/dev
    - name: proc
      mountPath: /host/proc
    - name: data
      mountPath: /volume
  volumes:
  - name: dev
    hostPath:
      path: /dev
  - name: proc
    hostPath:
      path: /proc
  - name: data
    hostPath:
      path: <host-path-to-replica>
  restartPolicy: Never

Example:

apiVersion: v1
kind: Pod
metadata:
  name: longhorn-launch
spec:
  hostPID: true
  containers:
  - name: engine
    image: longhornio/longhorn-engine:v1.8.0
    securityContext:
      privileged: true
    command: ["launch-simple-longhorn"]
    args: ["pvc-27c076f8-5710-416f-9729-83194cad4aac", "10737418240"]
    volumeMounts:
    - name: dev
      mountPath: /host/dev
    - name: proc
      mountPath: /host/proc
    - name: data
      mountPath: /volume
  volumes:
  - name: dev
    hostPath:
      path: /dev
  - name: proc
    hostPath:
      path: /proc
  - name: data
    hostPath:
      path: /var/lib/longhorn/replicas/pvc-27c076f8-5710-416f-9729-83194cad4aac-7fb2c32d
  restartPolicy: Never

This manifest creates a privileged pod that mounts your replica data and exposes it as a standard block device. Be sure to replace the placeholders with your actual values, including:

The correct Longhorn engine version
Your volume name (from the replica path)
The exact volume size in bytes (from volume.meta)
The full path to your replica directory

Step 4: Monitor the Recovery Process Through Pod Logs

To verify the recovery process is working, check the pod logs:

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml

Find the pod:

/var/lib/rancher/rke2/bin/crictl pods | grep longhorn-launch

Then tail the logs:

/var/lib/rancher/rke2/bin/crictl logs <container-id>

Once the pod is running, you should see log messages indicating that the Longhorn engine has started and the volume is available.

Step 5: Mount and Access the Recovered Volume Data

After the Longhorn engine initializes successfully, a new block device will appear on your system:

/dev/longhorn/<volume-name>

Mount this device in read-only mode to prevent any accidental data corruption:

mkdir -p /mnt/longhorn
mount -o ro /dev/longhorn/pvc-27c076f8-5710-416f-9729-83194cad4aac /mnt/longhorn

At this point, all your volume data is accessible under /mnt/longhorn. You can use standard file operations to copy data to a safe location:

# Example: Create a backup archive
tar -czf /tmp/volume-backup.tar.gz -C /mnt/longhorn .

# Or copy specific files
cp -rp /mnt/longhorn/important-data /tmp/backup/

# Or use rsync for large data sets
rsync -av /mnt/longhorn/ /tmp/backup/

You can also create a volume in Longhorn and mount it in maintenance mode to this same node to copy the data directly to the new volume.

Step 6: Clean Up After Recovery

Once you've recovered your data, clean up the resources:

rm /var/lib/rancher/rke2/agent/pod-manifests/longhorn-recovery.yaml

After removing the manifest file, RKE2 will automatically stop the static pod, and the block device will disappear. You should also unmount the filesystem before this happens:

umount /mnt/longhorn

Best Practice: Always mount Longhorn recovery volumes as read-only (-o ro) to prevent accidental data corruption. Any writes to an isolated replica could cause data inconsistencies if you later restore the Longhorn system.

Cause

This issue occurs when a Longhorn volume cannot be attached to a pod or node using the usual methods. This often happens because the volume is in a faulted state or because Longhorn's control components are unavailable. When this happens, users are unable to access critical data through the Kubernetes PVC system or the Longhorn UI, even though the actual replica data may still be present on disk.

Several conditions can lead to this scenario. A volume might become faulted due to failed rebuilds or persistent disk errors. The Longhorn manager or engine components may be unresponsive, making it impossible to manage or recover the volume through the UI. Nodes holding key replicas may fail or go offline, and when they return, Longhorn may be unable to reestablish quorum or restore the volume's state. In some cases, even a successful reboot or maintenance task can leave the volume unattachable if the cluster cannot reconcile its metadata.

When the data is urgently needed to restore service, waiting for Longhorn to recover is not always an option. In such cases, manually recovering the replica data and using it to bring systems back online becomes the fastest and most reliable way to restore functionality.

Additional Information

Longhorn recovery documentation: https://longhorn.io/docs/1.8.1/advanced-resources/data-recovery/

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.