Delayed outgoing packets causing NFS timeouts
This document (000019943) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12 SP5
Situation
nfs: server *HOSTNAME* not responding, still tryingand after several minutes
nfs: server *HOSTNAME* OKBecause of an in-kernel retransmit timer (or another packet being queued), the stuck packet will eventually be sent out, after a delay.
In tcpdump packet capture analysis, this problem can be identified by spurious resend attempts of the same packet (with equal TSVal) a long time apart.
Resolution
SUSE has released kernel maintanance update that will mitigate this problem by disabling the lockless optimization on pfifo_fast qdisc (which is the only qdisc currently making use of this optimization) [4].
The issue is solved in the following kernel versions:
- SLES15 SP2: 5.3.18-24.61
- SLES12 SP5: 4.12.14-122.66
echo 'net.core.default_qdisc = fq_codel' >>/etc/sysctl.conf sysctl -w net.core.default_qdisc=fq_codel tc qdisc add dev $devname root handle 1: mq tc qdisc del dev $devname rootIn case the $devname above is not a multiqueue-capable device, the following commands have to be used instead:
echo 'net.core.default_qdisc = fq_codel' >>/etc/sysctl.conf sysctl -w net.core.default_qdisc=fq_codel tc qdisc add dev $devname root handle 1: fq_codel tc qdisc del dev $devname root
Cause
However, the lockless optimization has a design flaw which (under certain very specific circumstances) opens a window for a race condition that causes the "last" packet in the queue to be stuck (and not sent out to the wire) for a potentially unbound amount of time, causing network stalls.
Additional Information
[2] http://git.kernel.org/linus/c5ad119fb6c0
[3] https://lore.kernel.org/netdev/d102074f-7489-e35a-98cf-e2cad7efd8a2@netrounds.com/t/
[4] https://github.com/openSUSE/kernel-source/commit/1c59b584ef0cc166f6f5c9f8ed6f47e2e811e1c0
[5] https://github.com/openSUSE/kernel-source/commit/3aa0c01fad38360cc9cd840d49bdfdc565e2e718
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019943
- Creation Date: 14-Apr-2021
- Modified Date:20-Apr-2021
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com