bzr plugin doesnt parallelize well
Summary
The bzr
plugin does some "atomic" operations and makes a backup and such which is not very atomic, this causes errors when moving directories which cause builds to fail when concurrent track/fetch operations on the bzr source cache directory are done.
Steps to reproduce
Run the test_track_recurse[bzr]
many times until it fails.
What is the current bug behavior?
The test sometimes fails, locally and in CI.
What is the expected correct behavior?
It never fails.
Relevant logs and/or screenshots
Here is are some example failed CI jobs
A sample of the culprit failure looks like this:
[00:00:00] FAILURE track-test-target-bzr.bst: bzr source at track-test-target-bzr.bst [line 3 column 2]: Failed to move srcdir '/builds/BuildStream/buildstream/.tox/py36/tmp/test_track_recurse_bzr_0/cache/sources/bzr/tmpzurr_ye6' to mirror dir '/builds/BuildStream/buildstream/.tox/py36/tmp/test_track_recurse_bzr_0/cache/sources/bzr/file____builds_BuildStream_buildstream__tox_py36_tmp_test_track_recurse_bzr_0_repo'
Possible fixes
Fix plugins/sources/bzr.py
_atomic_replace_mirrordir
to handle the atomic swapping atomically... this requires understanding why there is a backup directory.
One tip is that utils.move_atomic()
should be employed and DirectoryExistsError
should be handled to determine if the directory exists, the first check if the directory exists is a race with the nested rename, instead the else
block which follows should be an except
if the directory existed.