Skip to content

Fix flaky test TestStorageCleanup_AcquireNextStorage

Sami Hiltunen requested to merge smh-flaky-storage-cleanup into master

The StorageCleanup functionality uses leases in the database to ensure only one worker works on a given storage at a time. The tests then assert that that certain records are received and the leases prevent picking up certain records. The leases are set to one second in the test, and there's a background goroutine that updates the lease record every 900ms. If the test executes slow enough, there's a possibility of a race where the worker doesn't update the liveness before the test picks up the next record.

This issue affects multiple tests although not all of them would flake due to it necessarily. Only the last test actually tests lease updating. This commit addresses the issue by raising the lease expiration to 24h so it won't expire during the test.

This does signal a real issue though where a leases can be held at the same time by multiple workers. The code is only used to log repositories which exist on the storages but are not tracked by Praefect. If the race was hit, we'd possibly get duplicate log records so the issue itself is likely not worth fixing.

Closes #5150 (closed)

Edited by Sami Hiltunen

Merge request reports