Skip to content

ext4: allow concurrent unaligned dio overwrites

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2151952
Upstream Status: linux.git
Tested: via fstests, fio.

commit 310ee0902b8d9d0a13a5a13e94688a5863fa29c2
Author: Brian Foster bfoster@redhat.com
Date: Tue Mar 14 09:07:59 2023 -0400

ext4: allow concurrent unaligned dio overwrites  

We've had reports of significant performance regression of sub-block  
(unaligned) direct writes due to the added exclusivity restrictions  
in ext4. The purpose of the exclusivity requirement for unaligned  
direct writes is to avoid data corruption caused by unserialized  
partial block zeroing in the iomap dio layer across overlapping  
writes.  

XFS has similar requirements for the same underlying reasons, yet  
doesn't suffer the extreme performance regression that ext4 does.  
The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY  
mode, which allows for optimistic submission of concurrent unaligned  
I/O and kicks back writes that require partial block zeroing such  
that they can be submitted in a safe, exclusive context. Since ext4  
already performs most of these checks pre-submission, it can support  
something similar without necessarily relying on the iomap flag and  
associated retry mechanism.  

Update the dio write submission path to allow concurrent submission  
of unaligned direct writes that are purely overwrite and so will not  
require block zeroing. To improve readability of the various related  
checks, move the unaligned I/O handling down into  
ext4_dio_write_checks(), where the dio draining and force wait logic  
can immediately follow the locking requirement checks. Finally, the  
IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a  
precaution should the ext4 overwrite logic ever become inconsistent  
with the zeroing expectations of iomap dio.  

The performance improvement of sub-block direct write I/O is shown  
in the following fio test on a 64xcpu guest vm:  

Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting  
--overwrite=1 --thread --size=10G --filename=/mnt/fio  
--readwrite=write --ramp_time=10s --runtime=60s --numjobs=8  
--blocksize=2k --iodepth=256 --allow_file_create=0  

v6.2:           write: IOPS=4328, BW=8724KiB/s  
v6.2 (patched): write: IOPS=801k, BW=1565MiB/s  

Signed-off-by: Brian Foster <bfoster@redhat.com>  
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>  
Reviewed-by: Jan Kara <jack@suse.cz>  
Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.com  
Signed-off-by: Theodore Ts'o <tytso@mit.edu>  

Signed-off-by: Brian Foster bfoster@redhat.com

Merge request reports