Transient timeouts in Geo jobs
The Geo job sometimes times out, but it's not clear why. Unfortunately the logs aren't much help because when this sort of time out occurs the log doesn't show an error, nor the action that occurred just before the end (the last debug output shown probably wasn't the last action). E.g.,
- https://gitlab.com/gitlab-org/quality/nightly/-/jobs/331845174
- https://gitlab.com/gitlab-org/quality/nightly/-/jobs/332611398
- https://gitlab.com/gitlab-org/quality/nightly/-/jobs/342849479
D, [2019-11-06T04:30:49.617989 #25] DEBUG -- : clicking :your_projects_link with args {}
D, [2019-11-06T04:30:49.618356 #25] DEBUG -- : finding :your_projects_link with args {:text=>nil}
D, [2019-11-06T04:30:49.654267 #25] DEBUG -- : found :your_projects_link
D, [2019-11-06T04:30:52.031459 #25] DEBUG -- : next wait uses reload: true
D, [2019-11-06T04:30:52.032405 #25] DEBUG -- : with wait: max 120.0; interval 0.1
D, [2019-11-06T04:30:52.033161 #25] DEBUG -- : within element :project_filter_form
Pulling docker image gitlab/gitlab-runner-helper:x86_64-fa86510e ...
ERROR: Job failed: execution took longer than 1h0m0s seconds
Note that Pulling docker image gitlab/gitlab-runner-helper:x86_64-fa86510e ...
is just the message the runner shows to indicate that it's cleaning up the failed job. It's not the cause of the error.
This sort of failure has happened before, and it turned out to be caused by a bug in the test code related to git auth: #25818 (closed) (that may or may not be the case now. It's possible there's some other transient git-related bug in the test). Note that in some of those cases, the logs didn't show the git commands even though they were made.