Weekly Reliability (SRE) Team Newsletter – On-call Period: 2023-01-24 - 2023-01-31
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Cindy Pallares |
| SRE 8-hour Americas | Cameron McFarland |
| SRE 8-hour Americas | Marcel Chacon |
| SRE 8-hour APAC | Devin Sylva |
| SRE 8-hour EMEA | Ahmad Sherif |
| SRE 8-hour EMEA | Steve Azzopardi |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 16
- Incident Issues : 19
- CorrectiveAction Issues : 0
Change Issues
- 2023-01-28T08:03:30Z - Unblock database async index creation (production#8312 - closed)
- 2023-01-27T19:43:52Z - 2023-01-27: [STAGING] Improve caching policy in... (production#8310 - closed)
- 2023-01-27T16:50:11Z - 2023-01-31: Functionally shard redis-repository... (production#8309 - closed)
- 2023-01-27T16:35:51Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8308+
- 2023-01-27T14:40:52Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8307+
- 2023-01-27T10:58:07Z - 2023-01-27: [Shared-gitlab-org runners] enable ... (production#8305 - closed)
- 2023-01-26T19:19:21Z - 2023-02-03: Thanos compactor migration (production#8303 - closed)
- 2023-01-25T14:24:03Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8291+
- 2023-01-25T13:55:33Z - [GPRD] Execute the logical replication test in ... (production#8290 - closed)
- 2023-01-24T16:42:20Z - change the identity of tables to DEFAULT for 19... (production#8286 - closed)
- 2023-01-24T07:40:45Z - 2023-01-26: Migrate Grafana deployment to gitla... (production#8284 - closed)
- 2023-01-24T02:24:10Z - 2023-01-24: Update kube-prometheus-stack to v44... (production#8282 - closed)
- 2023-01-24T02:15:54Z - 2023-01-24: Update kube-prometheus-stack to v44... (production#8281 - closed)
- 2023-01-23T20:31:22Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8280+
- 2023-01-23T15:26:26Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8279+
- 2023-01-23T07:52:37Z - 2023-01-23: delete insecure firewall rules in t... (production#8276 - closed)
Incident Issues
- 2023-01-28T18:54:29Z - 2023-01-28: Thanos Compactor finding duplicate ... (production#8314 - closed) | reliability~3760142 | ServiceThanos |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8314 - 2023-01-28T13:57:43Z - 2023-01-28: WebsocketsServiceLoadbalancerErrorS... (production#8313 - closed) | reliability~3760141 | ServiceWebsockets |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8313 - 2023-01-27T21:22:17Z - 2023-01-27: PatroniServiceRailsPrimarySqlApdexS... (production#8311 - closed) | reliability~3760140 | ServicePatroni |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8311 - 2023-01-27T13:10:56Z - 2023-01-27: PrometheusManyRestarts - Thanos com... (production#8306 - closed) | reliability~3760142 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8306 - 2023-01-26T22:45:52Z - 2023-01-26: QA Canary Tests Failing Due To Vuln... (production#8304 - closed) | reliability~3760141 | ServiceGitLab Rails |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8304 - 2023-01-26T17:40:56Z - 2023-01-26: WebsocketsServiceLoadbalancerErrorS... (production#8302 - closed) | reliability~3760141 | ServiceWebsockets |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8302 - 2023-01-26T11:07:14Z - 2023-01-26: postgres user donot have read acces... (production#8301 - closed) | reliability~3760141 | ServiceCustomersDot |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8301 - 2023-01-26T10:57:37Z - 2023-01-26: gprd-cny QA failure - gitlab-qa acc... (production#8300 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8300 - 2023-01-26T10:06:26Z - 2023-01-26: LoggingVisibilityDimished (production#8299 - closed) | reliability~3760141 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8299 - 2023-01-26T09:42:21Z - 2023-01-26: ApiServiceWorkhorseApdexSLOViolation (production#8298 - closed) | reliability~3760141 | ~"Service::Workhorse" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8298 - 2023-01-26T08:40:08Z - 2023-01-26: SSL certificate for pages.gitlab.io... (production#8296 - closed) | reliability~3760142 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8296 - 2023-01-26T06:55:30Z - 2023-01-26: thanos is restarting frequently (production#8295 - closed) | reliability~3760142 | ServiceThanos |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8295 - 2023-01-25T22:27:27Z - 2023-01-25: ExternalDNSStale (production#8294 - closed) | reliability~3760141 | ServiceExternalDNS |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8294 - 2023-01-25T21:31:53Z - 2023-01-25: LoggingVisibilityDiminished (production#8293 - closed) | reliability~3760141 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8293 - 2023-01-25T17:29:24Z - 2023-01-25: QA Canary Tests Failing - Unable to... (production#8292 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8292 - 2023-01-25T09:08:15Z - 2023-01-25: PVSServiceHTTPApdexSLOViolation (production#8289 - closed) | reliability~3760142 | ServicePVS |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8289 - 2023-01-24T17:18:04Z - 2023-01-24: Chef client has been disabled for a... (production#8287 - closed) | | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8287 - 2023-01-24T13:28:30Z - 2023-01-24: WebServiceWorkhorseErrorSLOViolatio... (production#8285 - closed) | reliability~3760140 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8285 - 2023-01-23T08:50:51Z - 2023-01-23: High unacked messages in pubsub-wor... (production#8277 - closed) | reliability~3760142 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8277
CorrectiveAction Issues
- 2023-01-26T18:29:16Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17184+
- 2023-01-25T07:57:05Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17171+
- 2023-01-24T04:34:14Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17164+
Open Issue Stats
- Oncall issues : 2
- Change issues : 22
- Incident issues : 7
- Access Request : 0
- CorrectiveAction : 91
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2023-01-28T18:54:29Z | 2023-01-28: Thanos Compactor finding duplicate ... (production#8314 - closed) |
| 2023-01-27T21:22:17Z | 2023-01-27: PatroniServiceRailsPrimarySqlApdexS... (production#8311 - closed) |
| 2023-01-26T08:40:08Z | 2023-01-26: SSL certificate for pages.gitlab.io... (production#8296 - closed) |
| 2023-01-24T17:18:04Z | 2023-01-24: Chef client has been disabled for a... (production#8287 - closed) |
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net