Frequent failures with"timeout waiting for buildbox-casd"

Summary

I've been running bst master a lot recently, and frequently come across a time out while waiting for buildbox-casd to become ready. I'd say this behaviour happens about half the times I invoke bst, at least. This mainly happens when I invoke bst from within podman, I can't remember if it was as bad when I was running buildstream on my host system.

Steps to reproduce

I get this behaviour pretty consistently when running bst foo from inside a buildstream/buildstream:nightly docker image, using podman. Once buildbox-casd starts properly once the rate of failures goes down. I tend to mount a cache into the image too, if that affects the behaviour.

What is the current bug behavior?

Here's the most common example:

[--:--:--][        ][    main:core activity                 ] START   Push                                                                                                                                                                    
[--:--:--][        ][    main:core activity                 ] START   Loading elements                                                                                                                                                        
[00:00:14][        ][    main:core activity                 ] FAILURE Loading elements                                                                                                                                                        
                                                                                                                       
[00:00:15][        ][    main:core activity                 ] FAILURE Push                                                                                                                                                                    
                                                                                                                                                                                                                                              
Timed out waiting for buildbox-casd to become ready                                                                                                                                                                                           
[--:--:--][        ][    main:core activity                 ] START   Terminating buildbox-casd                                                                                                                                               
[00:00:00][        ][    main:core activity                 ] SUCCESS Terminating buildbox-casd   

However sometimes a stack trace is also included:

[--:--:--][        ][    main:core activity                 ] START   Push                                                                                                                                                                    
[--:--:--][        ][    main:core activity                 ] START   Loading elements                                                                                                                                                        
Process Process-1:                                                                                                                                                                                                                            
Traceback (most recent call last):                                                                                                                                                                                                            
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap                                                                                                                                                             
    self.run()                                                                                                                                                                                                                                
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run                                                                                                                                                                     
    self._target(*self._args, **self._kwargs)                                                                                                                                                                                                 
  File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/cascache.py", line 1018, in _subprocess_run                                                                                                                                 
    local_cas = self._connection.get_local_cas()                                                                                                                                                                                              
  File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/casdprocessmanager.py", line 213, in get_local_cas                                                                                                                          
    self._establish_connection()                                                                                                                                                                                                              
  File "/usr/local/lib64/python3.7/site-packages/buildstream/_cas/casdprocessmanager.py", line 190, in _establish_connection                                                                                                                  
    raise CASCacheError("Timed out waiting for buildbox-casd to become ready")                                                                                                                                                                
buildstream._exceptions.CASCacheError: Timed out waiting for buildbox-casd to become ready                                                                                                                                                    
[00:00:14][        ][    main:core activity                 ] FAILURE Loading elements                                                                                                                                                        
[00:00:15][        ][    main:core activity                 ] FAILURE Push                                                                                                                                                                    
                                                                                                                                                                                                                                              
Timed out waiting for buildbox-casd to become ready                                                                                                                                                                                           
[--:--:--][        ][    main:core activity                 ] START   Terminating buildbox-casd                                                                                                                                               
[00:00:00][        ][    main:core activity                 ] SUCCESS Terminating buildbox-casd           

What is the expected correct behavior?

Buildstream starts properly

Possible fixes

One workaround could be a longer timeout, but that seems to be hiding the problem rather than removing it

Other relevant information

Running buildstream/buildstream:nightly-master-98530279 in podman 1.6.2 on arch linux.


To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information