Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-08-23 - 2022-08-30
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Hendrik Meyer |
| SRE 8-hour Americas | Nels Nelson |
| SRE 8-hour APAC | Filipe Santos |
| SRE 8-hour EMEA | Michal Wasilewski |
| SRE 8-hour EMEA | Rehab Hassanein |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 11
- Incident Issues : 14
- CorrectiveAction Issues : 0
Change Issues
- 2022-08-28T20:22:44Z - Disable NUMA-hinted foreground page migrations ... (production#7660 - closed)
- 2022-08-26T14:28:22Z - Enforce Cloudflare Authenticated Origin Pulls (production#7658 - closed)
- 2022-08-25T19:42:41Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7654+
- 2022-08-25T14:11:48Z - [GPRD] Enable pg_wait_sampling in Production (production#7653 - closed)
- 2022-08-25T13:38:21Z - 2022-08-25: Create new CI Runners projects in GCP (production#7652 - closed)
- 2022-08-25T09:46:49Z - Download Jemalloc reports from web & api after ... (production#7651 - closed)
- 2022-08-24T22:04:43Z - Cleanup class/script for unused still-active Pe... (production#7649 - closed)
- 2022-08-24T12:20:34Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7647+
- 2022-08-23T18:07:40Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7639+
- 2022-08-23T11:49:09Z - 2022-08-23: Merge gitaly chef run_list for main... (production#7637 - closed)
- 2022-08-23T10:33:31Z - DRAFT: Create 4th k8s cluster on GSTG (production#7636 - closed)
Incident Issues
- 2022-08-29T01:44:48Z - 2022-08-29: RegistryServiceGarbagecollectorErro... (production#7661 - closed) | reliability~3760142 | ServiceContainer Registry |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7661 - 2022-08-27T10:44:00Z - 2022-08-27: Grafana pods crashlooping (production#7659 - closed) | reliability~3760141 | ServiceGrafana |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7659 - 2022-08-26T12:42:51Z - 2022-08-26: LoggingVisibilityDiminished (production#7657 - closed) | reliability~3760141 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7657 - 2022-08-26T04:44:04Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7656+ | reliability~3760140 | ServiceInfrastructure |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7656 - 2022-08-25T22:34:22Z - 2022-08-25: 401 errors downloading PyPi packages (production#7655 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7655 - 2022-08-25T01:16:45Z - 2022-08-25: Prometheus WAL corruption on us-eas... (production#7650 - closed) | reliability~3760142 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7650 - 2022-08-24T08:05:26Z - 2022-08-24: PostgreSQL_QueriesDominatingTotalQu... (production#7645 - closed) | reliability~3760142 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7645 - 2022-08-24T07:02:44Z - 2022-08-24: fatal: [patroni-data-analytics-01-d... (production#7644 - closed) | reliability~3760141 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7644 - 2022-08-24T06:52:33Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7643+ | reliability~3760142 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7643 - 2022-08-24T06:50:18Z - 2022-08-24: fatal: [patroni-data-analytics-01-d... (production#7642 - closed) | | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7642 - 2022-08-23T21:40:38Z - 2022-08-23: Postgres Service Down -- postgres-d... (production#7641 - closed) | reliability~3760142 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7641 - 2022-08-23T14:07:29Z - 2022-08-23: locked projects warning showing une... (production#7638 - closed) | reliability~3760139 | ServiceLicense |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7638 - 2022-08-22T09:18:48Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7635+ | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7635 - 2022-08-22T09:18:17Z - 2022-08-22: The gitalyruby SLI of the gitaly se... (production#7634 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7634
CorrectiveAction Issues
- 2022-08-24T13:50:27Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16231+
- 2022-08-24T13:45:49Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16230+
Open Issue Stats
- Oncall issues : 2
- Change issues : 29
- Incident issues : 6
- Access Request : 0
- CorrectiveAction : 88
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2022-08-29T01:44:48Z | 2022-08-29: RegistryServiceGarbagecollectorErro... (production#7661 - closed) |
| 2022-08-25T22:34:22Z | 2022-08-25: 401 errors downloading PyPi packages (production#7655 - closed) |
| 2022-08-15T16:40:33Z | 2022-08-15: Shared macOS Runners Failing (production#7601 - closed) |
| 2022-08-09T17:31:57Z | 2022-08-09: GCS slowness in zone us-east1-c aff... (production#7570 - closed) |
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net