reftable/stack: fix race in up-to-date check
In 6fdfaf15 (reftable/stack: use stat info to avoid re-reading stack list, 2024-01-11) we have introduced a new mechanism to avoid re-reading the table list in case stat(3P) figures out that the stack didn't change since the last time we read it.
While this change significantly improved performance when writing many refs, it can unfortunately lead to false negatives in very specific scenarios. Given two processes A and B, there is a feasible sequence of events that cause us to accidentally treat the table list as up-to-date even though it changed:
-
A reads the reftable stack and caches its stat info.
-
B updates the stack, appending a new table to "tables.list". This will both use a new inode and result in a different file size, thus invalidating A's cache in theory.
-
B decides to auto-compact the stack and merges two tables. The file size now matches what A has cached again. Furthermore, the filesystem may decide to recycle the inode number of the file we have replaced in (2) because it is not in use anymore.
-
A reloads the reftable stack. Neither the inode number nor the file size changed. If the timestamps did not change either then we think the cached copy of our stack is up-to-date.
In fact, the commit introduced three issues:
-
Non-POSIX compliant systems may not report proper
st_dev
andst_ino
values in stat(3P). -
stat_validity_check()
and friends may end up not comparingst_dev
andst_ino
depending on the "core.checkstat" config. -
st_ino
can be recycled, rendering the check moot even on POSIX-compliant systems.
Refactor the code to stop using stat_validity_check()
. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both st_dev
and
st_ino
set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare st_dev
and st_ino
, addressing the second
issue. According to the POSIX standard, "The st_ino and st_dev fields
taken together uniquely identify the file within the system", and thus
this should be safe to establish the identity of the file.
The third issue of inode recycling can be addressed by keeping the file descriptor of "files.list" open during the lifetime of the reftable stack. As the file will still exist on disk even though it has been unlinked it is impossible for its inode to be recycled as long as the file descriptor is still open.
This should address the race in a POSIX-compliant way. The only real downside is that this mechanism cannot be used on non-POSIX-compliant systems like Windows. But we at least have the second-level caching mechanism in place that compares contents of "files.list" with the currently loaded list of tables.
This new mechanism performs roughly the same as the previous one that
relied on stat_validity_check()
:
Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s] Range (min … max): 4.694 s … 4.802 s 20 runs
Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s] Range (min … max): 4.691 s … 4.753 s 20 runs
Summary update-ref: create many refs (HEAD~) ran 1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)
Signed-off-by: Patrick Steinhardt ps@pks.im
Fixes #241 (closed).