RAID1 data segments can be put on one drive, causing integrity and caching weirdness
I was attempting to create and expand a heterogeneous RAID1 volume as seen in that stackoverflow question, but I seem to have run into several issues.
I had a vg of 3 disks: one 8TB, one 1.5TB, and one 3TB. Using 2.03.11 in Debian 11, I ran
$ lvcreate --type raid1 -m 1 --raidintegrity y -L 1.6TiB -n MyNewLV MyVG
This failed the first few times, and unfortunately I haven't kept those logs, but I believe the issue was the raidintegrity settings, as looking at my command history I see a similar command but with raidintegrity n
. One of the commands ended up working. However, the resulting setup was:
8TB drive: MyNewLV_0 stripe#0(0-1.5TB), MyNewLV_1 stripe#1(1.5TB-1.6TB)
1.5TB drive: MyNewLV_1 stripe#0(0-1.5TB)
3TB drive: MyNewLV_0 stripe#1(1.5TB-1.6TB)
or alternativly shown as:
MyNewLV_0: 8TB Drive (0--1.5TB), 3TB drive (1.5TB-1.6TB)
MyNewLV_1: 1.5TB Drive (0--1.5TB), 8TB drive (1.5TB-1.6TB)
Note that both legs of the raid have segments on the 8TB drive. I'm not sure if the recovery from this situation is correct, but the allocation is definitely weird, and probably shouldn't happen. This is Bug1, it directly led to the next issue, Bug 2.
I then added a 4th drive to the VG, a second 3TB volume, and lvextended the LV in that issue. The resulting output was:
--- Logical volume ---
MyNewLV_0:
--- Segments ---
Logical extents 0 to 349396:
Type linear
Physical volume /dev/sdd2
Physical extents 687046 to 1036442
Logical extents 349397 to 369505:
Type linear
Physical volume /dev/sda3
Physical extents 634432 to 654540
Logical extents 369506 to 786431:
Type linear
Physical volume /dev/sdd2
Physical extents 1056552 to 1473477
--- Logical volume ---
MyNewLV_1:
--- Segments ---
Logical extents 0 to 349396:
Type linear
Physical volume /dev/sdb3
Physical extents 1 to 349397
Logical extents 349397 to 369505:
Type linear
Physical volume /dev/sdd2
Physical extents 1036443 to 1056551
Logical extents 369506 to 786431:
Type linear
Physical volume /dev/sdc2
Physical extents 0 to 416925
- sda3 is the first 3TB volume, and had another LV on it, leaving just 4GB free
- sdb3 is the 1.5TB volue, and was full
- sdd2 is the 8TB volume, and had >2TB free space
- sdc2 is the second 3TB volume, and had >2TB free space
Bug 2: What possessed LVM for THAT allocation?
I hadn't really paid attention to the allocations at this point, but now I added integrity:
lvconvert --raidintegrity y MyVG/MyNewLV
I then attempted to add cache, but it failed, so I now booted into a new distro with LVM 2.03.23, and looked at the allocations for the first time. I noticed this:
--- Logical volume ---
Internal LV Name MyNewLV_corig_rimage_0_imeta
--- Segments ---
Logical extents 0 to 6307:
Type linear
Physical volume /dev/sdd2
Physical extents 1473478 to 1479785
--- Logical volume ---
Internal LV Name MyNewLV_corig_rimage_1_imeta
--- Segments ---
Logical extents 0 to 6307:
Type linear
Physical volume /dev/sdd2
Physical extents 1479786 to 1486093
That is very wrong and implies that a failed sdd
drive would defeat dm_integrity. (Bug 3) That was wrong! So I attempted to pvmove
things around, but the imeta and rimage volues all spat out Unable to pvmove device used for raid with integrity.
whenever I tried (Bug 4). So I disabled integrity, and yet, I still got Unable to pvmove device used for raid with integrity.
despite NOT HAVING IT ENABLED ANYMORE!! (Bug 5)
I then tried the other rimage segment, and it moved! But after it was done, the other side was still stuck. I finally got it unstuck with:
$ pvmove --alloc anywhere -n MyVG/MyNewLV_rimage_1 /dev/sdd2:1036443-1056551 /dev/sdc2
At this point, I wasn't taking anymore chances, so I carefully pvmove
d the segments around so that sdd2/_0 was linear, and the other side (_1) was reasonable. At that point I re-enabled raid integrity and they were placed sanely, with _0_imeta on sdd2 along with _0_iorig, while _1_imega and _1_iorig both on the other disks. I enabled caching successfully and I think it's back to a sane state, though grub does complain. But that's not an LVM issue, I don't think.
Sorry about this long story, and reporting multiple bugs in one go, but they all seem fairly interrelated