Zoned block device support
New feature description
Zoned block device is a class of block devices with sequential write constraints: in each device-specific typically several hundreds MB zones, write must be sequential starting at the current write pointer. Common zoned devices are host-managed SMR disks and ZNS NVME drives. Kernel dm-crypt already supports zoned block devices since 5.9 in this patch, on CONFIG_BLK_DEV_ZONED=y
, which is the default for most distributions.
But currently cryptsetup luksFormat
and also other header modification operations do not work on zoned devices, presumably due to non-sequential writes in the zone containing the LUKS header.
To reproduce, first setup an emulated zoned block device using null_blk
module. Here's a script to create /dev/nullb0
in memory with 8GiB size, 128MiB zone size and 4KiB block size. It is modified from the script in https://lwn.net/Articles/836726/
#!/usr/bin/env bash
set -eo pipefail
sysfs=/sys/kernel/config/nullb/nullb0
if [[ -d $sysfs ]]; then
echo 0 > "${sysfs}"/power
rmdir $sysfs
fi
lsmod | grep -q null_blk && rmmod null_blk
modprobe null_blk nr_devices=0
mkdir "${sysfs}"
echo 8192 > "${sysfs}"/size # MiB
echo 1 > "${sysfs}"/zoned
echo 0 > "${sysfs}"/zone_nr_conv
echo 128 > "${sysfs}"/zone_size # MiB
echo 1 > "${sysfs}"/memory_backed
echo 4096 > "${sysfs}"/blocksize
echo 1 > "${sysfs}"/power
udevadm settle
Then cryptsetup luksFormat
on it will result in an write error.
# cryptsetup luksFormat /dev/nullb0
WARNING!
========
This will overwrite data on /dev/nullb0 irrevocably.
Are you sure? (Type 'yes' in capital letters): YES
Enter passphrase for /dev/nullb0:
Verify passphrase:
Device wipe error, offset 32768.
Cannot wipe header on device /dev/nullb0.
dmesg
errors:
[582511.046639] I/O error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046651] I/O error, dev nullb0, sector 248 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046656] I/O error, dev nullb0, sector 496 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046660] I/O error, dev nullb0, sector 744 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046664] I/O error, dev nullb0, sector 992 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046668] I/O error, dev nullb0, sector 1240 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046672] I/O error, dev nullb0, sector 1488 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046676] I/O error, dev nullb0, sector 1736 op 0x1:(WRITE) flags 0x4800 phys_seg 31 prio class 0
[582511.046680] I/O error, dev nullb0, sector 1984 op 0x1:(WRITE) flags 0x800 phys_seg 8 prio class 0
[582511.046716] I/O error, dev nullb0, sector 0 op 0x1:(WRITE) flags 0xc800 phys_seg 31 prio class 0
However, I found a workaround: create a detached LUKS header and sequentially write it back to the device.
# blkzone reset /dev/nullb0 # Discard all zones and rewind write pointers.
# cryptsetup luksFormat --header ./header --offset $((128 << 20 >> 9)) /dev/nullb0 # Align offset to the second zone, or dm-crypt will fail.
# dd if=./header of=/dev/nullb0 bs=4k oseek=0 conv=notrunc oflag=direct # Attach the header back.
# cryptsetup luksOpen /dev/nullb0 test-zoned
In this way, it is correctly formatted to LUKS and can be opened or closed without issues. But header operations like luksAddKey
will still fail with IO error while encrypting keyslot.
, unless do the detach-modify-attach trick again.
I wish to have a more convenient way to format and/or modify LUKS partitions/disks with internal header. But there are some concerns:
- We cannot overwrite the header zone without issuing a ZONE_RESET, but that will temporarily destroy all data in the zone, including the previous header. I think it's probably fine because formatting will destroy data anyway, and key modifications are rare and we can warn user to backup the header themselves.
- Zone size is usually too large for just an LUKS header. It is typically 256MiB for SMR disks or 1-4GiB for ZNS drives, as mentioned here. But dm-crypt requires offset to be zone aligned. What we can do may only be clamping
--luks2-keyslots-size
to like 16MiB to reduce writes on luksFormat.
Additional info
$ cryptsetup --version
cryptsetup 2.7.0 flags: UDEV BLKID KEYRING KERNEL_CAPI HW_OPAL