Skip to content

2025-10-07: Gitaly goserver apdex SLO violation for GRPC requests

Gitaly goserver apdex SLO violation for GRPC requests (Severity 4 (Low))

Problem: The gitaly service in the cny stage was overloaded by slow or expensive requests, causing its apdex score for GRPC requests to fall below the required SLO.

Impact: GRPC requests to the gitaly service experienced degraded performance, resulting in the apdex score dropping below the SLO threshold. Both Gitaly and web services had elevated error rates during the incident.

Causes: A significant spike in inflight Git commands and resource usage overloaded the gitaly node. The '/gitaly.CommitService/ListCommitsByOid' endpoint and 'PackObjectsHookWithSidechannel' process saw large increases in activity and resource contention, including CPU and minor page faults. This led to concurrency limits being reached and a drop in successful responses.

Response strategy: The incident self-resolved after approximately 15 minutes. There was a feature flag mirroring_lfs_optimization enabled a few minutes after the incident started, and it was turned off as a precaution, however it is not believed to be related. Both Gitaly and web service alerts have now been resolved.


This ticket was created to track INC-4537, by incident.io 🔥