Weekly Reliability (SRE) Team Newsletter – On-call Period: 2023-09-19 - 2023-09-26

Announcements

Engineering Week in Review Highlights:

Team Updates


On-Call During This Period

Schedule Username
SRE 8-hour Americas Alex Hanselka
SRE 8-hour Americas Cameron McFarland
SRE 8-hour Americas Sarah Walker
SRE 8-hour APAC Filipe Santos
SRE 8-hour APAC Adeline Yeung
SRE 8-hour EMEA Ahmad Sherif
SRE 8-hour EMEA Steve Xuereb

PagerDuty Incidents

See the 1 week report for acknowledged PD pages (long-term trend)

Alerts Volume

7 Day Issue Stats

  • Oncall issues : 0
  • Access Request : 0
  • Change Issues : 4
  • Incident Issues : 21
  • CorrectiveAction Issues : 0

Change Issues

Incident Issues

CorrectiveAction Issues

Open Issue Stats

Open Change Issues

Show/Hide Table
Created Summary
2023-09-21T04:43:56Z [GPRD] Migrate duplicate jobs workload to redis... (production#16410 - closed)
2023-09-20T04:46:05Z [GSTG] Migrate duplicate jobs workload to redis... (production#16402 - closed)
2023-09-18T18:49:46Z https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16390+
2023-09-18T12:51:33Z GPRD: Rollout USE_CI_BUILDS_ROUTING_TABLE envir... (production#16387 - closed)
2023-09-15T03:01:46Z 2023-09-15: Enabling current_subscription_with_... (production#16375 - closed)
2023-09-13T23:22:33Z 2023-09-25: Cutover 100% of codesuggestions.git... (production#16361 - closed)
2023-09-12T13:59:40Z https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16354+
2023-09-12T13:44:59Z Container Registry: Apply post-deployment migra... (production#16353 - closed)
2023-09-12T04:47:30Z [CR][GPRD] Provision redis-cluster-shared-state (production#16350 - closed)
2023-09-07T12:29:11Z 2023-09-11: [GPRD] Rebuild delayed and archive ... (production#16316 - closed)
2023-09-06T05:31:08Z https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16309+
2023-09-04T04:13:36Z 2023-09-29: shut down sentry.gitlab.net (production#16296 - closed)
2023-08-22T19:55:32Z Temporarily disable `index_events_on_author_id_... (production#16218 - closed)
2023-08-17T08:56:24Z [GPRD] Destroy `patroni-v12-ci` cluster (production#16189 - closed)
2023-08-10T07:26:41Z https://gitlab.com/gitlab-com/gl-infra/production/-/issues/16156+
2023-07-31T03:10:25Z [GPRD] - Restart VMs to fix the issue related t... (production#16099 - closed)
2023-07-19T17:38:59Z 2023-07-TBD: [DRAFT] [GPRD] - Delete unused Pat... (production#16062 - closed)
2023-07-18T14:00:13Z Reindex wikis, users and projects to resize the... (production#16055 - closed)
2023-07-06T21:09:44Z DRAFT - [GPRD][2023-07-06] Execute PG14 Upgrade... (production#15993 - closed)
2023-07-04T10:50:28Z Increase concurrency for container registry dat... (production#15976 - closed)
2023-06-28T01:23:29Z [CR] [gprd] Send 100% of main traffic to HAProx... (production#15951 - closed)
2023-06-27T10:25:48Z Draft: [GPRD] PgBouncer upgrade (production#15939 - closed)
2023-06-27T10:24:13Z Draft: [GSTG] PgBouncer upgrade (production#15938 - closed)
2023-06-27T10:22:54Z [db-benchmarking] PgBouncer upgrade (production#15937 - closed)
2023-06-05T19:49:28Z 2023-06-13: [GPRD] Patroni CI Cluster - Drop po... (production#14721 - closed)
2023-05-18T08:07:10Z [GPRD] Install pg_stat_kcache package in DR pos... (production#14449 - closed)
2023-05-04T02:19:45Z 2023-05-04: Remove admin rights from ops-gitlab... (production#11074 - closed)
2023-04-21T18:04:19Z 2023-05-23: DEV pgupgrade to update database to... (production#8779 - closed)
2023-04-05T17:23:13Z [2023-04-19 23:45 UTC][GSTG] Patroni Main Clust... (production#8678 - closed)
2023-03-28T16:11:57Z [GPRD] Execute logical replication and upgrade ... (production#8611 - closed)
2022-12-22T03:57:59Z 2022-12-22: Add registry expiration policies to... (production#8186 - closed)
2022-11-14T23:01:34Z Change the schedule of automatic database reind... (production#8048 - closed)
2022-10-18T07:03:19Z 2022-10-18: [GPRD] Use turbo mode on restore co... (production#7893 - closed)
2022-10-18T06:45:39Z 2022-10-18: [GPRD] Use turbo mode on restore co... (production#7892 - closed)
2022-10-06T07:05:45Z Disable fastupdate on merge_requests GIN indexes (production#7840 - closed)
2022-09-02T09:37:55Z Removal of sigin-Page text for staging and .com (production#7681 - closed)
2022-08-16T16:43:58Z 2022-08-16: Increase max_client_conn for pgboun... (production#7607 - closed)
2022-05-23T16:22:23Z 2022-05-23: Update auth scopes to include servi... (production#7117 - closed)

Open Incident Issues

Show/Hide Table
Created Summary
2023-09-24T14:10:35Z 2023-09-24: WebService Error SLO Violation (production#16421 - closed)
2023-09-22T00:40:53Z 2023-09-22: Error rate increase in web (production#16417 - closed)
2023-09-20T00:10:17Z 2023-09-20: PraefectServiceProxyErrorSLOViolation (production#16401 - closed)
2023-08-14T21:45:39Z 2023-08-14: The server_route_blob_upload_uuid_d... (production#16175 - closed)

Open Oncall Issues

Show/Hide Table
Created Summary
2021-09-17T19:35:34Z https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+
2020-12-18T22:29:14Z https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+

Issues for Review during Incident Review Meeting

If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net