Skip to content

reftable/stack: fix race in up-to-date check

Patrick Steinhardt requested to merge pks-reftable-stack-reload-race into master

In 6fdfaf15 (reftable/stack: use stat info to avoid re-reading stack list, 2024-01-11) we have introduced a new mechanism to avoid re-reading the table list in case stat(3P) figures out that the stack didn't change since the last time we read it.

While this change significantly improved performance when writing many refs, it can unfortunately lead to false negatives in very specific scenarios. Given two processes A and B, there is a feasible sequence of events that cause us to accidentally treat the table list as up-to-date even though it changed:

  1. A reads the reftable stack and caches its stat info.

  2. B updates the stack, appending a new table to "tables.list". This will both use a new inode and result in a different file size, thus invalidating A's cache in theory.

  3. B decides to auto-compact the stack and merges two tables. The file size now matches what A has cached again. Furthermore, the filesystem may decide to recycle the inode number of the file we have replaced in (2) because it is not in use anymore.

  4. A reloads the reftable stack. Neither the inode number nor the file size changed. If the timestamps did not change either then we think the cached copy of our stack is up-to-date.

In fact, the commit introduced three issues:

  • Non-POSIX compliant systems may not report proper st_dev and st_ino values in stat(3P).

  • stat_validity_check() and friends may end up not comparing st_dev and st_ino depending on the "core.checkstat" config.

  • st_ino can be recycled, rendering the check moot even on POSIX-compliant systems.

Refactor the code to stop using stat_validity_check(). Instead, we manually stat(3P) the file descriptors to make relevant information available. On Windows and MSYS2 the result will have both st_dev and st_ino set to 0, which allows us to address the first issue by not using the stat-based cache in that case. It also allows us to make sure that we always compare st_dev and st_ino, addressing the second issue. According to the POSIX standard, "The st_ino and st_dev fields taken together uniquely identify the file within the system", and thus this should be safe to establish the identity of the file.

The third issue of inode recycling can be addressed by keeping the file descriptor of "files.list" open during the lifetime of the reftable stack. As the file will still exist on disk even though it has been unlinked it is impossible for its inode to be recycled as long as the file descriptor is still open.

This should address the race in a POSIX-compliant way. The only real downside is that this mechanism cannot be used on non-POSIX-compliant systems like Windows. But we at least have the second-level caching mechanism in place that compares contents of "files.list" with the currently loaded list of tables.

This new mechanism performs roughly the same as the previous one that relied on stat_validity_check():

Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s] Range (min … max): 4.694 s … 4.802 s 20 runs

Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s] Range (min … max): 4.691 s … 4.753 s 20 runs

Summary update-ref: create many refs (HEAD~) ran 1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)

Signed-off-by: Patrick Steinhardt ps@pks.im

Fixes #241 (closed).

Edited by Patrick Steinhardt

Merge request reports