Fixing Longhorn Volumes That Refuse to Attach
This document (000021788) is provided subject to the disclaimer at the end of this document.
Environment
Longhorn v1.2.0+
Situation
Longhorn volumes in Kubernetes clusters can sometimes fail to attach to pods, causing persistent issues where pods enter restart loops or remain in a pending state. These issues often occur in environments with node disruptions, incorrect scheduling, or replica faults. This article outlines common attachment issues and provides troubleshooting steps and resolutions to restore volume functionality.
Resolution
Scenario 1: Volumes Detach Unexpectedly and Won’t Reattach
-
Restart the Deployment or StatefulSet to recreate pods.
-
Ensure your Longhorn version is 1.2.0 or later, which includes auto-recreation of pods upon unexpected detachments.
Scenario 2: Volumes Can't Attach Even After Pod Recreation
-
Identify the affected PVC and its bound PersistentVolume (PV).
-
Scale down pods and ensure the volume is detached.
-
Use
kubectl -n longhorn-system edit volumes.longhorn.io <volume-name>
to clear these fields:-
spec.nodeID
, -
status.currentNodeID
, -
status.ownerID
, -
status.pendingNodeID
-
-
Reapply changes and scale up pods.
Scenario 3: Volumes Can’t Attach Due to Prior Attachments (RWO Limitation)
-
Use
spec.nodeName
in the pod template to pin all pods accessing the volume to the same node. -
Alternatively, use Pod Affinity to co-locate pods.
-
Consider migrating to RWX (ReadWriteMany) volumes via Longhorn Share Manager (NFS-backed).
Scenario 4: Faulted Replicas Blocking Attachments
-
Try the Salvage option from the Longhorn UI.
-
If that fails and the fault is acceptable:
-
Use
kubectl -n longhorn-system edit replicas.longhorn.io <replica-name>
-
Clear the
spec.failedAt
field. -
Force-reattach the volume (only if data integrity is not critical).
-
-
For production workloads, consider using multiple replicas for fault tolerance.
Cause
Longhorn volumes can become unresponsive or fail to attach due to inconsistencies in the control plane state or underlying replica issues. These problems typically arise when the system is interrupted or doesn't clean up volume metadata properly after node transitions, pod restarts, or other cluster events.
One common cause is that Longhorn retains outdated information about a volume's attachment status. For example, fields like nodeID
, ownerID
, or currentNodeID
may still be set, causing Longhorn to incorrectly assume the volume is already in use, even when it's not. This stale state can block new attachment attempts and lead to pods getting stuck in a pending or crashloop state.
Another issue occurs when replicas are marked as faulted due to incomplete writes, running out of space, or disruptions during I/O. Even if the data is still usable, Longhorn may refuse to attach the volume for safety reasons unless it's manually salvaged or reset.
In environments using ReadWriteOnce (RWO) volumes, attachment failures can also happen when multiple pods or workloads try to access the volume from different nodes. If Longhorn believes the volume is still attached elsewhere, it will block new attachments to maintain data integrity.
Finally, during normal node operations like draining, cordoning, or rescheduling pods, volume metadata may become desynchronized. Longhorn might think the volume is still attached or being operated on, leading to attachment errors until the control plane state is corrected.
Additional Information
Longhorn Documentation: https://longhorn.io/docs/
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021788
- Creation Date: 11-Apr-2025
- Modified Date:16-Apr-2025
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com