On-Call Handover 2024-05-22 23:00 UTC
On-Call Handover
Brought to you by woodhouse
- EOC egress: @msmiley @astarovoytov
- EOC ingress: @pguinoiseau @astarovoytov
- IM egress: @theoretick
- IM ingress: @plu8
- CMOC egress: @kchappell abuerer supportops
- CMOC ingress: @kchappell abuerer clawrence supportops
Previous on-call issue: #4971 (closed) - On-Call Handover 2024-05-22 15:00 UTC
📖 Summary:
What (if any) time-critical work is being handed over?
Nothing urgent to hand over.
What contextual info may be useful for the next few on-call shifts?
- production#17947 (closed) - The CA cert rotation for the production regional GKE cluster is going well. This corresponds to yesterday's equivalent work in the staging environment.
- production#18048 (closed) - An brief but large spike in sidekiq job rate caused sidekiq's pgbouncer db connection pool to saturate for about 12 minutes. This delayed the start of other jobs but self-resolved quickly.
- production#17991 (closed) - A Game Day exercise for a zonal outage in the staging environment concluded today. Clean up should be complete now.
-
production#18047 (closed) - Internal-impact only. A remarkably slow test was causing timeouts on merge requests for the
gitlab/gitlab-org
project. The relevant test has been temporarily quarantined, unblocking others' work.
🔴 Ongoing alerts/incidents:
GitLab
- production#18032 (closed) - severity4 2024-05-18: ExtPvsServiceRunwayIngressErrorSLOViolation
- production#17980 (closed) - severity3 2024-05-08: KubeServiceClusterScaleupsErrorSLOViolation
✅ Resolved alerts/incidents:
GitLab
- production#18048 (closed) - severity3 2024-05-22: pgbouncer CPU saturation for sidekiq pools
🔵 Mitigated incidents:
- production#18048 (closed) - severity3 2024-05-22: pgbouncer CPU saturation for sidekiq pools
🔓 Change issues:
In Progress
- production#17947 (closed) - 2024-05-22: [CR] [gprd] Rotate Certificate Authority for GKE/Kubernetes Cluster
Edited by Matt Smiley