Frequent failures with"timeout waiting for buildbox-casd"
Summary
I've been running bst master a lot recently, and frequently come across a time out while waiting for buildbox-casd to become ready. I'd say this behaviour happens about half the times I invoke bst, at least. This mainly happens when I invoke bst from within podman, I can't remember if it was as bad when I was running buildstream on my host system.
Steps to reproduce
I get this behaviour pretty consistently when running bst foo
from inside a buildstream/buildstream:nightly docker image, using podman. Once buildbox-casd starts properly once the rate of failures goes down. I tend to mount a cache into the image too, if that affects the behaviour.
What is the current bug behavior?
Here's the most common example:
[--:--:--][ ][ main:core activity ] START Push
[--:--:--][ ][ main:core activity ] START Loading elements
[00:00:14][ ][ main:core activity ] FAILURE Loading elements
[00:00:15][ ][ main:core activity ] FAILURE Push
Timed out waiting for buildbox-casd to become ready
[--:--:--][ ][ main:core activity ] START Terminating buildbox-casd
[00:00:00][ ][ main:core activity ] SUCCESS Terminating buildbox-casd
However sometimes a stack trace is also included:
[--:--:--][ ][ main:core activity ] START Push
[--:--:--][ ][ main:core activity ] START Loading elements
Process Process-1:
Traceback (most recent call last):
File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/cascache.py", line 1018, in _subprocess_run
local_cas = self._connection.get_local_cas()
File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/casdprocessmanager.py", line 213, in get_local_cas
self._establish_connection()
File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/casdprocessmanager.py", line 190, in _establish_connection
raise CASCacheError("Timed out waiting for buildbox-casd to become ready")
buildstream._exceptions.CASCacheError: Timed out waiting for buildbox-casd to become ready
[00:00:14][ ][ main:core activity ] FAILURE Loading elements
[00:00:15][ ][ main:core activity ] FAILURE Push
Timed out waiting for buildbox-casd to become ready
[--:--:--][ ][ main:core activity ] START Terminating buildbox-casd
[00:00:00][ ][ main:core activity ] SUCCESS Terminating buildbox-casd
What is the expected correct behavior?
Buildstream starts properly
Possible fixes
One workaround could be a longer timeout, but that seems to be hiding the problem rather than removing it
Other relevant information
Running buildstream/buildstream:nightly-master-98530279 in podman 1.6.2 on arch linux.