Skip to content

2025-04-19 : Gitaly goserver SLI violation impacting cny stage apdex

Gitaly goserver SLI violation impacting cny stage apdex (Severity 2)

Problem: Intermittent high memory pressure and high CPU usage on the Gitaly canary node (gitaly-cny-01) results in increased errors for both web and GitLab in the cny stage, impacting Gitaly goserver SLI and apdex.

Impact: The increase in errors is negatively affecting the performance and reliability of the repositories hosted on gitaly-cny-01, leading to 500 errors through the UI and errors during git operations.

Causes: Git pack-objects processes with large memory footprints linger for extended periods, leading to concurrency limits being reached and gRPC calls being queued and eventually timing out.

Response strategy: A mitigation script to kill long-lived, memory-intensive git pack-objects processes to maintain low memory pressure was used until a permanent fix was implemented. The option to disable backpressure of pack-objects caching was deployed to production and we subsequently disabled backpressure of pack objects caching, however this did not reveal any improvement. Further investigation revealed that we should set min_occurrences to 0. This latest change significantly reduced memory pressure, as intended. The temporary mitigation is no longer necessary and has been disabled.


This ticket was created to track INC-518, by incident.io 🔥