Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-11-22 - 2022-11-29

Announcements

Engineering Week in Review Highlights:

Team Updates


On-Call During This Period

Schedule Username
SRE 8-hour Americas Nels Nelson
SRE 8-hour Americas Stephanie Jackson
SRE 8-hour APAC Furhan Shabir
SRE 8-hour APAC Gonzalo Servat
SRE 8-hour EMEA Ahmad Sherif
SRE 8-hour EMEA Steve Azzopardi

PagerDuty Incidents

See the 1 week report for acknowledged PD pages (long-term trend)

Alerts Volume

7 Day Issue Stats

  • Oncall issues : 0
  • Access Request : 0
  • Change Issues : 4
  • Incident Issues : 6
  • CorrectiveAction Issues : 0

Change Issues

Incident Issues

CorrectiveAction Issues

Open Issue Stats

Open Change Issues

Show/Hide Table
Created Summary
2022-11-22T19:07:27Z Rebuild Staging Zonal Cluster us-east1-c (production#8083 - closed)
2022-11-22T18:54:06Z Reindex top 25 indexes on registry DB with the ... (production#8082 - closed)
2022-11-17T16:41:35Z Migrate prometheus VMs to use SSD data disks (production#8066 - closed)
2022-11-14T23:01:34Z Change the schedule of automatic database reind... (production#8048 - closed)
2022-11-14T17:54:45Z Increase `maximum attachment size` setting to 1... (production#8046 - closed)
2022-11-01T00:52:47Z CR - GPRD - Upgrade Ubuntu on PGBouncer nodes (production#7967 - closed)
2022-11-01T00:52:32Z CR - GSTG - Upgrade Ubuntu on PGBouncer nodes (production#7966 - closed)
2022-10-18T07:03:19Z 2022-10-18: [GPRD] Use turbo mode on restore co... (production#7893 - closed)
2022-10-18T06:45:39Z 2022-10-18: [GPRD] Use turbo mode on restore co... (production#7892 - closed)
2022-10-14T03:26:01Z https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7877+
2022-10-11T15:17:53Z 2022-12-01: Gradually increase the number of ma... (production#7862 - closed)
2022-10-07T22:14:21Z Delete the production gemnasium Cloud SQL datab... (production#7848 - closed)
2022-10-06T07:05:45Z Disable fastupdate on merge_requests GIN indexes (production#7840 - closed)
2022-10-05T01:45:36Z 2022-10-05: Rollout PREVENT_LOAD_BALANCER_RETRI... (production#7836 - closed)
2022-10-05T00:00:51Z Reduce io-threads on redis-persistent in staging (production#7834 - closed)
2022-09-02T09:37:55Z Removal of sigin-Page text for staging and .com (production#7681 - closed)
2022-08-25T14:11:48Z [GPRD] Enable pg_wait_sampling in Production (production#7653 - closed)
2022-08-17T19:40:56Z Move version.gitlab.com to new namespace, GCP p... (production#7615 - closed)
2022-08-16T16:43:58Z 2022-08-16: Increase max_client_conn for pgboun... (production#7607 - closed)
2022-07-22T14:23:23Z 2022-07-22: Update home page url in staging fro... (production#7497 - closed)
2022-06-06T15:55:01Z Rollout usage of ci-gateway ILB for on shared r... (production#7206 - closed)
2022-05-23T16:22:23Z 2022-05-23: Update auth scopes to include servi... (production#7117 - closed)

Open Incident Issues

Show/Hide Table
Created Summary
2022-11-02T14:18:07Z 2022-11-02: Intermittent Internal API unreachable (production#7979 - closed)
2022-10-25T20:05:24Z 2022-10-25: Intermittent kas.gitlab.com timeouts (production#7924 - closed)

Open Oncall Issues

Show/Hide Table
Created Summary
2021-09-17T19:35:34Z https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+
2020-12-18T22:29:14Z https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+

Issues for Review during Incident Review Meeting

If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by Anthony Fappiano