Random out of memory and timeout errors running integration tests in GitLab CI
We have now seen a couple of random errors when running the integration tests in GitLab CI.
-
https://gitlab.com/gitlab-org/container-registry/-/jobs/380400514: The first one was reported by Hayley in !19 (comment 261317437). In this case the pipeline failed without any error/log from the application (and the tests that have run did so within the timeout window), so we were not sure what was the problem. A retry fixed it.
-
https://gitlab.com/gitlab-org/container-registry/-/jobs/386131207: This one was reported by Stan in !18 (comment 264037494). In this case we do have a complete log and we can see that this was an out of memory (OOM) error while executing the
DriverSuite.TestConcurrentStreamReads
test. Again, a retry was enough to fix it. -
https://gitlab.com/gitlab-org/container-registry/-/jobs/388298946: In this case the 20 minutes timeout was exhausted while executing the
DriverSuite.TestWriteReadLargeStreams
test.
After error 2
, I think we can now relate this to the increase of pre-allocated random bytes for storage driver tests (!19 (merged)). It looks like we are exceeding the memory limits of the CI runners on some executions.
We need to investigate this further. We definitely need the increase of pre-allocated random bytes from !19 (merged), as we currently have some 1GB file benchmarks.