Fix instability of CI tests
This is mostly affecting:
-
make tests-refresh
and make tests-refresh-resilience
For now, we're setting these CI jobs to be manual (in TBD), since they pass reliably locally.
Possible causes/solutions:
- In some tests we run
rugged initialize --local
. We suspect that it is taking some time for the repo metadata and keys to sync to the workers (volume mounts, NFS, etc.) We might want to consider running without the--local
option. - Alternatively, we might want to update the
initialize
command, to have it wait for NFS to sync, or something. - Waiting for a specific amount of time does not seem to be reliable. Maybe we should add a step definition that'd look like:
And I wait for "STRING" in "FILE"
(with a timeout, etc.)
Edited by Christopher Gervais