race between xfs_zero_eof/direct write can cause corruption
This document (7017183) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)
Situation
This happens when the file is written via Direct I/O.
There are no errors, even successful writes are reported.
Data corruption had also been seen on kvm guests with qcow2 virtual disks on XFS. The virtual disks were configured with aio=native and not extended to their maximum file size.
Resolution
SLES11 SP3 3.0.101-0.47.71.1 released November 2015
SLES11 SP4 3.0.101-68.1 released December 2015
SLES12 3.12.51-52.31.1 released December 2015
SLES12 SP1 3.12.51-60.20.2 released December 2015
It is recommended to update the kernel to avoid the risk of data corruption.
Cause
The problem happens when direct IO, smaller than the fs block size, is issued into the last file block (partial) and at the same time direct IO starting beyond EOF (end of file) is issued as well.
The zeroing of the last partial block can race with the direct IO into the partial block and thus result in lost direct IO write.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7017183
- Creation Date: 22-Jan-2016
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com