Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-11-08 - 2022-11-15
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Alex Hanselka |
| SRE 8-hour Americas | Hendrik Meyer |
| SRE 8-hour APAC | Devin Sylva |
| SRE 8-hour EMEA | Alejandro Rodríguez |
| SRE 8-hour EMEA | Igor Wiedler |
| SRE 8-hour EMEA | Steve Azzopardi |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 8
- Incident Issues : 18
- CorrectiveAction Issues : 0
Change Issues
- 2022-11-14T00:28:30Z - 2022-11-28: [GPRD] Upgrade Consul agents on rem... (production#8041 - closed)
- 2022-11-14T00:08:46Z - 2022-11-22: [GPRD] Update Consul agent on Patro... (production#8040 - closed)
- 2022-11-11T06:39:53Z - 2022-11-21: [GPRD] Upgrade Consul agents in k8s... (production#8038 - closed)
- 2022-11-11T06:29:28Z - 2022-11-16: [GPRD] Upgrade Consul cluster to 1.... (production#8037 - closed)
- 2022-11-11T04:20:52Z - 2022-11-15: GPRD: Roll out max_replica_pools to... (production#8036 - closed)
- 2022-11-10T06:25:27Z - 2022-11-16: [GPRD] Deploy Consul cluster in k8s... (production#8032 - closed)
- 2022-11-08T03:02:49Z - 2022-11-08: Add manual cert for dev.gitlab.org... (production#8019 - closed)
- 2022-11-07T07:20:43Z - Reindex User index to apply new mapping (production#8007 - closed)
Incident Issues
- 2022-11-10T22:03:49Z - 2022-11-10: bootstrap-vue update breaks vulnera... (production#8035 - closed) | reliability~3760142 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8035 - 2022-11-10T18:17:50Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8034+ | reliability~3760139 | ServiceAPI |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8034 - 2022-11-10T15:55:55Z - 2022-11-10: Increased 502 errors on api/v4/jobs... (production#8033 - closed) | reliability~3760141 | ServiceAPI |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8033 - 2022-11-09T18:36:29Z - 2022-11-09: Prometheus running OOM (production#8030 - closed) | reliability~3760142 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8030 - 2022-11-09T14:56:21Z - 2022-11-09: LoggingVisibilityDiminished for rai... (production#8029 - closed) | reliability~3760141 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8029 - 2022-11-09T14:33:39Z - 2022-11-09: pgbouncer_client_conn_primary close... (production#8028 - closed) | reliability~3760141 | ServicePgbouncer |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8028 - 2022-11-09T13:30:17Z - 2022-11-09: Blackbox probe failures for custome... (production#8027 - closed) | reliability~3760141 | ~"Service::Customers" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8027 - 2022-11-09T08:05:49Z - 2022-11-09: file descriptors high on file-54-st... (production#8025 - closed) | reliability~3760140 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8025 - 2022-11-09T05:04:56Z - 2022-11-09: staging.gitlab.com slow/unresponsiv... (production#8024 - closed) | reliability~3760141 | ServiceSidekiq |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8024 - 2022-11-09T00:51:08Z - 2022-11-09: gitlab-restore/postgres-dev-1 has g... (production#8023 - closed) | reliability~3760142 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8023 - 2022-11-08T14:21:40Z - 2022-11-08: file-cny-01 apdex violation (production#8022 - closed) | reliability~3760141 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8022 - 2022-11-08T14:09:23Z - 2022-11-08: pgbouncer max_client_conn very clos... (production#8021 - closed) | reliability~3760140 | ServicePgbouncer |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8021 - 2022-11-08T04:45:08Z - 2022-11-08: Prometheus VMs down (production#8020 - closed) | reliability~3760141 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8020 - 2022-11-08T02:58:28Z - 2022-11-08: Customers disk filling up (production#8018 - closed) | reliability~3760141 | ServiceCustomersDot |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8018 - 2022-11-07T12:28:22Z - 2022-11-07: Long-running transactions in patron... (production#8014 - closed) | reliability~3760141 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8014 - 2022-11-07T12:01:24Z - 2022-11-07: Large backlog for pubsub-sentry-inf... (production#8013 - closed) | reliability~3760142 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8013 - 2022-11-07T08:56:18Z - 2022-11-07: Kubernetes node-level hotspotting a... (production#8010 - closed) | reliability~3760141 | ServiceGit |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8010 - 2022-11-07T08:52:25Z - 2022-11-07: GitServiceWorkhorseAuthApiApdexSLOV... (production#8008 - closed) | reliability~3760141 | ServiceGit |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8008
CorrectiveAction Issues
- 2022-11-08T22:53:09Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16799+
- 2022-11-08T22:36:30Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16798+
- 2022-11-07T22:06:37Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16788+
Open Issue Stats
- Oncall issues : 2
- Change issues : 28
- Incident issues : 10
- Access Request : 0
- CorrectiveAction : 94
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2022-11-10T15:55:55Z | 2022-11-10: Increased 502 errors on api/v4/jobs... (production#8033 - closed) |
| 2022-11-07T08:53:23Z | 2022-11-07: SSL certificate for dev.gitlab.org:... (production#8009 - closed) |
| 2022-11-07T04:16:15Z | https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8006+ |
| 2022-11-06T21:53:45Z | https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8005+ |
| 2022-11-02T14:18:07Z | 2022-11-02: Intermittent Internal API unreachable (production#7979 - closed) |
| 2022-10-29T13:50:58Z | 2022-10-29: Chef client has been disabled for a... (production#7947 - closed) |
| 2022-10-25T20:05:24Z | 2022-10-25: Intermittent kas.gitlab.com timeouts (production#7924 - closed) |
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net