Significant increase in Gitaly single node incidents
We've had a huge increase in the number of single node Gitaly incidents in April, .
If you look at the past six months worth of apdex, you can see a significant change the third or fourth week of March.
We also see an increase in CPU usage and `schedstat_waiting
Incidents by category in May
To look at April please check Reliability::Practices April 2023 service avail... (reliability-reports#166 - closed)
- cgroups:
- pack-object/git spawn contention:
Action Items
-
@qmnguyen0711: Fix cache key for pack-objects
👉 gitlab-org/gitaly#5087 (closed)- 2023-05-04: Blocked by security release: gitlab-org/gitlab!119574 (closed)
-
Investigate Apdex calculation 👉 scalability#2319 (closed) -
@steveazz: Rate limits: -
ListCommitsByOid
👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23570 -
CommitDiff
👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23571 -
projects/explore
👉 https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/5693/diffs -
Pack Objects limiting 👉 gitlab-org/gitaly#4413 (closed)
-
-
@steveazz: cgroups -
Discuss using a single parent cgroup 👉 https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23532#note_1377531701
-
Edited by Steve Xuereb