Skip to content

bzr plugin doesnt parallelize well

Summary

The bzr plugin does some "atomic" operations and makes a backup and such which is not very atomic, this causes errors when moving directories which cause builds to fail when concurrent track/fetch operations on the bzr source cache directory are done.

Steps to reproduce

Run the test_track_recurse[bzr] many times until it fails.

What is the current bug behavior?

The test sometimes fails, locally and in CI.

What is the expected correct behavior?

It never fails.

Relevant logs and/or screenshots

Here is are some example failed CI jobs

A sample of the culprit failure looks like this:

[00:00:00] FAILURE track-test-target-bzr.bst: bzr source at track-test-target-bzr.bst [line 3 column 2]: Failed to move srcdir '/builds/BuildStream/buildstream/.tox/py36/tmp/test_track_recurse_bzr_0/cache/sources/bzr/tmpzurr_ye6' to mirror dir '/builds/BuildStream/buildstream/.tox/py36/tmp/test_track_recurse_bzr_0/cache/sources/bzr/file____builds_BuildStream_buildstream__tox_py36_tmp_test_track_recurse_bzr_0_repo'

Possible fixes

Fix plugins/sources/bzr.py _atomic_replace_mirrordir to handle the atomic swapping atomically... this requires understanding why there is a backup directory.

One tip is that utils.move_atomic() should be employed and DirectoryExistsError should be handled to determine if the directory exists, the first check if the directory exists is a race with the nested rename, instead the else block which follows should be an except if the directory existed.

Edited by Tristan Van Berkom
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information