Skip to content
GitLab
    • Why GitLab
    • Pricing
    • Contact Sales
    • Explore
  • Why GitLab
  • Pricing
  • Contact Sales
  • Explore
  • Sign in
  • Register
  • cryptsetupcryptsetup
  • cryptsetup
  • Issues
  • #639

Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)

Issue description

With a Seagate FireCuda 520 2TB NVMe SSD running in PCIe 3.0 x4 mode (my motherboard does not have PCIe 4.0), discards through dm-integrity layer are extremely slow to the point of being almost unusable or in some cases fully unusable.

This is so slow that having the discard option on swap in not possible, as it takes around 3 minutes to complete for 32GiB swap causing timeouts during boot which in turn causes various other services to fail resulting in a drop to the emergency shell.

blkdiscard directly to NVMe device takes I think 10 sec or so for the entire 2TB, but through dm-integrity the rate is approx 10GiB per minute, meaning over 3 hours to discard the entire 2TB. Normal read and write operations are not affected and are high performance, easily reaching 2GiB/s through the entire layer: disk dm-integrity mdadm luks lvm ext4.

Checking the kernel thread usage in htop quite some dm-integrity-offload threads are in the D state with 0.0 CPU usage when discarding, which is rather odd. No integrity threads are actually working and read-write disk usage measured with dstat is not even 1MiB/s.

To detail the above, dstat shows extremely clear timings: 2 seconds 0k write, 1 second 512k write, repeat. Possible timeout in locks somewhere or other problematic lock situation?

Steps for reproducing the issue

  1. Create two 10G partitions on SSD.
  2. Setup dm-integrity on one of these and open the device with --allow-discards.
  3. blkdiscard both partitions.
  • Raw partition is done instantly.
  • Integrity partition takes around a minute.

Additional info

The NVMe device is formatted to native 4096 byte sectors and the dm-integrity layer also uses 4096 byte sectors.

Debian bullseye (testing), kernel 5.10.0-6-rt-amd64 5.10.28-1. Same issue occurred during testing with Arch Linux liveiso which is kernel 5.11.x. Cryptsetup package version 2.3.5.

On another server system (IBM POWER9, ppc64le) with SAS 3.0 SSD discard is working properly at more than acceptable speeds, showing significant CPU usage while discarding. In this case it is a regular Intel amd64 desktop system.

Debug log

Nothing really fails, dmesg and syslog show no issues/warnings at all, not sure what to include.

Assignee
Assign to
Time tracking