Eliminate Git-Related Infrastructure Failures in CI Pipelines

Context

Part of Improve observability and reduce most frequent ... (&8 - closed). Many CI/CD jobs are failing before our automation tools can properly detect, analyze, and categorize them. This issue specifically focuses on analyzing and eliminating the Infrastructure failure category. Based on data from Snowflake (https://app.snowflake.com/ys68254/gitlab/w1oaUFxQaSYz), we've identified several patterns of git-related failures that are causing pipeline instability.

Business Impact

Most Impactful Error States

The resolutions are tracked in the child tasks

Error Tracker Description Team to work with
"fatal: couldn't find remote ref" failures

#130 (closed)

Failure while cloning a repo with ref pipelineID Verify
"gitaly spawn failed" errors

#131 (closed)

unable to connect to gitaly

Gitaly Team

"fatal: fetch-pack: invalid index-pack output"

#132 (closed)

Git fails to receive or process repository data
  • Runner
  • Gitaly
"fatal: the remote end hung up unexpectedly"

#133 (closed)

Server abruptly terminates the connection during a Git operation,
"GitLab is currently unable to handle this request due to load"

#134 (closed)

Too much load while cloning a repo
  • Scalability
Edited by David Dieulivol