Git fetch by helper hanging occasionally on shared runners, possibly related to gitaly connection reset

Summary

A customer has reported (via ZD internal link) occasional cases where a job will hang at the Getting source from Git repository stage and run until the 3 hour job timeout is reached or the job is manually cancelled.

This results in CI compute minutes being unnecessarily consumed and the pipeline completion being delayed.

Investigating one example job revealed two possibly related Gitaly errors in Kibana:

gitaly-error

If these errors are related to the git fetch issued by the job in question, then they occurred approximately 15 minutes after the git fetch command was run by the job. Yet the job was still running 40 minutes later when it was manually cancelled.

The job ran on the following shared runner:

Running with gitlab-runner 16.1.0~beta.59.g83c66823 (83c66823)
  on blue-2.saas-linux-small-amd64.runners-manager.gitlab.com/default XxUrkriX, system ID: s_f46a988edce4
  feature flags: FF_USE_IMPROVED_URL_MASKING:true

A rerun of the same job completed in just over 2 minutes, including the repo fetch.

The pipeline in question is notable in that it consists of 173 jobs, many of which run concurrently. But there is no indication of rate limiting being a factor.

This raises the following questions (assuming the gitaly errors are related to the job in question):

  • why is the git fetch not completing in the usual time of around 4-7 seconds
  • why is gitaly reporting a connection reset after 15 minutes
  • why is the runner helper not detecting the failure after 15 minutes and instead keeping running

Steps to reproduce

Unfortunately this issue is not reproducible on demand.

What is the current bug behavior?

git fetch runs until job timeout reached or job cancelled.

What is the expected correct behavior?

git fetch should complete in the "expected" amount of time, every time

Environment details

Issue occurs on gitlab.com using saas-linux-small-amd64 shared runners.

Edited by Justin Farmiloe