Weekly Reliability (SRE) Team Newsletter – On-call Period: 2023-09-19 - 2023-09-26
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Alex Hanselka |
| SRE 8-hour Americas | Cameron McFarland |
| SRE 8-hour Americas | Sarah Walker |
| SRE 8-hour APAC | Filipe Santos |
| SRE 8-hour APAC | Adeline Yeung |
| SRE 8-hour EMEA | Ahmad Sherif |
| SRE 8-hour EMEA | Steve Xuereb |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 4
- Incident Issues : 21
- CorrectiveAction Issues : 0
Change Issues
- 2023-09-21T15:56:17Z - 2023-09-21: GPRD - Add a Patroni replica node f... (production#16414 - closed)
- 2023-09-21T04:43:56Z - [GPRD] Migrate duplicate jobs workload to redis... (production#16410 - closed)
- 2023-09-21T02:53:25Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16408+
- 2023-09-20T04:46:05Z - [GSTG] Migrate duplicate jobs workload to redis... (production#16402 - closed)
Incident Issues
- 2023-09-24T18:24:07Z - 2023-09-24: Huge increase in error rate for cny... (production#16423 - closed) | severity3 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16423 - 2023-09-24T16:53:07Z - 2023-09-24: Gitaly Goserver error violation for... (production#16422 - closed) | severity3 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16422 - 2023-09-24T14:10:35Z - 2023-09-24: WebService Error SLO Violation (production#16421 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16421 - 2023-09-22T11:44:26Z - 2023-09-22: QA smoke tests failing on gstg-cny (production#16419 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16419 - 2023-09-22T02:36:51Z - 2023-09-22: Elevated Kubernetes API latency in OPS (production#16418 - closed) | severity4 | ServiceKube |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16418 - 2023-09-22T00:40:53Z - 2023-09-22: Error rate increase in web (production#16417 - closed) | severity4 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16417 - 2023-09-21T23:07:36Z - 2023-09-21: GitalyServiceGoserverTrafficAbsentS... (production#16416 - closed) | severity4 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16416 - 2023-09-21T22:38:25Z - 2023-09-21: WebServiceLoadBalancer SLI violation (production#16415 - closed) | severity3 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16415 - 2023-09-21T13:38:15Z - 2023-09-21: PatroniServiceRailsReplicaSqlApdexS... (production#16413 - closed) | severity4 | ServicePatroni |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16413 - 2023-09-21T11:16:01Z - 2023-09-21: gitlab-cny-webservice-api doesn't s... (production#16412 - closed) | severity3 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16412 - 2023-09-21T10:37:27Z - 2023-09-21: NoActiveVaultInstance (production#16411 - closed) | severity4 | ServiceVault |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16411 - 2023-09-21T04:17:00Z - 2023-09-21: search is not indexing updates (production#16409 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16409 - 2023-09-21T00:34:45Z - 2023-09-21: Missing metrics for web service (production#16407 - closed) | severity3 | ~"Service::Monitoring" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16407 - 2023-09-20T20:04:48Z - 2023-09-20: Production deployment failed on 'pr... (production#16406 - closed) | severity3 | ServiceHAProxy |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16406 - 2023-09-20T17:20:39Z - 2023-09-20: Deployments failing due to deployer... (production#16405 - closed) | severity3 | ServiceDeployTooling |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16405 - 2023-09-20T15:16:07Z - 2023-09-20: rails_replica_sql Apdex SLO Violation (production#16404 - closed) | severity2 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16404 - 2023-09-20T14:35:23Z - 2023-09-20: walgBackup Delayed (production#16403 - closed) | severity4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16403 - 2023-09-20T00:10:17Z - 2023-09-20: PraefectServiceProxyErrorSLOViolation (production#16401 - closed) | severity4 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16401 - 2023-09-19T15:43:26Z - 2023-09-19: zoekt code search unavaible, throwi... (production#16400 - closed) | severity3 | ServiceZoekt |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16400 - 2023-09-19T14:51:37Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16399+ | severity3 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16399 - 2023-09-19T13:57:34Z - 2023-09-19: Single Node Puma Worker Saturation (production#16397 - closed) | severity4 | ServiceAPI |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16397
CorrectiveAction Issues
- 2023-09-25T01:14:36Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24483+
- 2023-09-25T01:04:12Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24482+
- 2023-09-21T07:30:24Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24472+
- 2023-09-20T19:17:08Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24465+
Open Issue Stats
- Oncall issues : 2
- Change issues : 38
- Incident issues : 7
- Access Request : 0
- CorrectiveAction : 95
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2023-09-24T14:10:35Z | 2023-09-24: WebService Error SLO Violation (production#16421 - closed) |
| 2023-09-22T00:40:53Z | 2023-09-22: Error rate increase in web (production#16417 - closed) |
| 2023-09-20T00:10:17Z | 2023-09-20: PraefectServiceProxyErrorSLOViolation (production#16401 - closed) |
| 2023-08-14T21:45:39Z | 2023-08-14: The server_route_blob_upload_uuid_d... (production#16175 - closed) |
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net