IO alignment probing delivers incorrect results on Linux when used with e.g. dm-crypt
In file-posix.c
:
/* Check if read is allowed with given memory buffer and length.
*
* This function is used to check O_DIRECT memory buffer and request alignment.
*/
static bool raw_is_io_aligned(int fd, void *buf, size_t len)
{
ssize_t ret = pread(fd, buf, len, 0);
if (ret >= 0) {
return true;
}
#ifdef __linux__
/* The Linux kernel returns EINVAL for misaligned O_DIRECT reads. Ignore
* other errors (e.g. real I/O error), which could happen on a failed
* drive, since we only care about probing alignment.
*/
if (errno != EINVAL) {
return true;
}
#endif
return false;
}
The comment claims that Linux always returns EINVAL
for misaligned O_DIRECT
reads. However, for block devices built on top of the Linux kernel's device-mapper infrastructure, this rule is demonstrably false. A trivial example showing its violation is dm-crypt:
In dm-crypt.c
insufficient alignment causes DM_MAPIO_KILL
:
/*
* Ensure that bio is a multiple of internal sector encryption size
* and is aligned to this size as defined in IO hints.
*/
if (unlikely((bio->bi_iter.bi_sector & ((cc->sector_size >> SECTOR_SHIFT) - 1)) != 0))
return DM_MAPIO_KILL;
if (unlikely(bio->bi_iter.bi_size & (cc->sector_size - 1)))
return DM_MAPIO_KILL;
Which is unconditionally translated into an IO error (i.e., EIO
) in dm-rq.c
:
case DM_MAPIO_KILL:
/* The target wants to complete the I/O */
dm_kill_unmapped_request(rq, BLK_STS_IOERR);
break;
For the probing of request_alignment
(i.e. the alignment of an IO request's length), a blkdebug
layer can be used to manually force a specific alignment. For the probing of buf_align
however (i.e., the alignment of an IO memory buffer), no such workaround exists (aside from disabling direct IO entirely, of course).
It seems to me that the linux-specific code here should just be removed entirely, given that it appears to be merely a performance optimization when working with failed drives, while at the same time causing essentially unsolvable IO errors when dealing with device-mapper devices that enforce an alignment > 512.