Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-08-09 - 2022-08-16
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Alex Hanselka |
| SRE 8-hour Americas | Matt Smiley |
| SRE 8-hour APAC | Devin Sylva |
| SRE 8-hour APAC | Pierre Guinoiseau |
| SRE 8-hour EMEA | Alejandro Rodriguez |
| SRE 8-hour EMEA | Steve Azzopardi |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 13
- Incident Issues : 16
- CorrectiveAction Issues : 1
Change Issues
- 2022-08-15T03:52:53Z - 2022-08-15: GSTG - Migrate TF module for Patron... (production#7595 - closed)
- 2022-08-12T19:18:30Z - [PRDSUB] Grow the customersdot production VM size (production#7591 - closed)
- 2022-08-12T18:07:28Z - [STGSUB] Grow the customersdot staging VM size (production#7590 - closed)
- 2022-08-12T16:07:57Z - Manually start migration for batch 2/2 of large... (production#7588 - closed)
- 2022-08-12T15:04:02Z - 2022-08-12: Opstrace Error Tracking Open Beta R... (production#7586 - closed)
- 2022-08-12T14:20:37Z - 2022-09-21: GSTG Truncate the rest of CI tables... (production#7585 - closed)
- 2022-08-12T13:26:03Z - Manually start migration for batch 1/2 of large... (production#7583 - closed)
- 2022-08-12T05:18:48Z - [08/16/2022 - 00:00 UTC] - GSTG - Disable post ... (production#7582 - closed)
- 2022-08-11T09:42:48Z - 2022-08-15: Reduce the default TTL for Rails.cache (production#7580 - closed)
- 2022-08-10T15:10:06Z - [08/17/2022 - 00:00 UTC] - Patroni Clusters OS ... (production#7577 - closed)
- 2022-08-10T08:58:23Z - [production] Enable `always_async_project_autho... (production#7574 - closed)
- 2022-08-09T12:48:33Z - [Production] Update Secret Revocation API URLs ... (production#7567 - closed)
- 2022-08-09T02:13:42Z - 2022-08-09: Update the node selectors and add n... (production#7566 - closed)
Incident Issues
- 2022-08-13T16:59:51Z - 2022-08-13: sidekiq import error rate and apdex... (production#7594 - closed) | reliability~3760141 | ServiceSidekiq |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7594 - 2022-08-13T06:42:45Z - 2022-08-13: Frontend and Git Workhorse SLO viol... (production#7593 - closed) | reliability~3760141 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7593 - 2022-08-12T22:44:36Z - 2022-08-12: Log ingestion stall due to ES hot n... (production#7592 - closed) | reliability~3760140 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7592 - 2022-08-12T16:44:49Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7589+ | reliability~3760139 | ~"Service::Customers" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7589 - 2022-08-12T15:08:13Z - 2022-08-12: Web service latency and error rate ... (production#7587 - closed) | reliability~3760141 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7587 - 2022-08-11T18:33:26Z - 2022-08-11: Log ingestion delay exceeds SLO (production#7581 - closed) | reliability~3760140 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7581 - 2022-08-11T07:19:41Z - 2022-08-11: walgBaseBackupDelayed across all cl... (production#7579 - closed) | reliability~3760141 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7579 - 2022-08-10T22:56:01Z - 2022-08-10: Gitaly apdex SLO for file-75 (production#7578 - closed) | reliability~3760141 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7578 - 2022-08-10T12:35:40Z - 2022-08-10: PostgreSQL_StatementTimeout_Errors ... (production#7576 - closed) | reliability~3760141 | ServiceContainer Registry |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7576 - 2022-08-10T11:52:08Z - 2022-08-10: WebServiceLoadbalancerErrorSLOViola... (production#7575 - closed) | reliability~3760141 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7575 - 2022-08-10T06:59:29Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7573+ | reliability~3760141 | ServicePatroni |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7573 - 2022-08-10T01:07:20Z - 2022-08-10: Multiple alerts from Patroni and we... (production#7572 - closed) | reliability~3760140 | ServicePatroni |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7572 - 2022-08-09T17:31:57Z - 2022-08-09: GCS slowness in zone us-east1-c aff... (production#7570 - closed) | reliability~3760142 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7570 - 2022-08-09T13:37:14Z - 2022-08-09: PubSub queuing high (production#7569 - closed) | reliability~3760141 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7569 - 2022-08-08T12:16:27Z - 2022-08-08: pgbouncer_client_conn_primary satur... (production#7565 - closed) | reliability~3760141 | ServicePgbouncer |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7565 - 2022-08-08T07:57:09Z - 2022-08-08: patroni-data-analytics-01-db-db-ben... (production#7563 - closed) | reliability~3760141 | ServiceConsul |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7563
CorrectiveAction Issues
- 2022-08-11T15:45:53Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16172+
Open Issue Stats
- Oncall issues : 2
- Change issues : 25
- Incident issues : 5
- Access Request : 0
- CorrectiveAction : 94
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2022-08-09T17:31:57Z | 2022-08-09: GCS slowness in zone us-east1-c aff... (production#7570 - closed) |
| 2022-08-08T12:16:27Z | 2022-08-08: pgbouncer_client_conn_primary satur... (production#7565 - closed) |
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net