Weekly Reliability (SRE) Team Newsletter – On-call Period: 2021-05-04 - 2021-05-11
This issue has been moved and the description cleared of content to avoid polluting the search results of this tracker, see the moved issue link for the original newsletter
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- ops-gitlab-net added Reliability-Team-Newsletter label
added Reliability-Team-Newsletter label
Highlights from EMEA
CI Apdex stabilizing
This is attributed to recent collaborations and changes:
- PVS rollout + related issue.
- CI Runners team increasing the concurrency across shared-managers https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13277
- Creating partial indexes for pending/running builds: gitlab-org/gitlab!60942 (merged)
- I sense I might have forgotten other major actions we took here.
2 GCP maintenances = 2 severity1's
- The first affected
patroni-04
on the 6th production#4500 (closed) - The second affected
redis-cache-02
on the 7th production#4517 (closed)
PG12 upgrade
API for canary is now on K8S
- Change issue: production#4406 (closed)
1Collapse replies - Maintainer
Americas Recap
PVS going live was not a silver bullet. Spent a lot of time finding, blocking, and documenting miner patterns. By the weekend, though, the numbers of miners was noticeably less. So far, after the pg12 upgrade and PVS, the runners and patroni have been feeling pretty quiet.
Chef CI
Last week, our cookbook-publisher ruby script stopped working due to some gem versioning changes. Ahmad created a fix. Things are working as expected now.
Edited by Cameron McFarland 1 1
- ops-gitlab-net changed the description
Compare with previous version changed the description
Thanks @cindy , @cmcfarland , and @cmiskell for the detailed additions related to CI Runners in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13311
Collapse replies - Owner
Added a suggestion to comment in #f_pipeline_validation_service and link to repo in the sirt runbook: https://gitlab.com/gitlab-com/gl-security/runbooks/-/merge_requests/342
Edited by Dave Smith
- Craig Miskell closed
closed
- 🤖 GitLab Bot 🤖 added workflow-infraDone label
added workflow-infraDone label
moved to reliability-reports#95 (closed)
- John Jarvis changed the description
Compare with previous version changed the description