Weekly Reliability (SRE) Team Newsletter – On-call Period: 2024-05-07 - 2024-05-14
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
Schedule | Username |
---|---|
SRE 8-hour Americas | Cameron McFarland |
SRE 8-hour Americas | Marcel Chacon |
SRE 8-hour APAC | Devin Sylva |
SRE 8-hour APAC | Nick Duff |
SRE 8-hour EMEA | Igor Wiedler |
SRE 8-hour EMEA | Jack Stephenson |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 13
- Incident Issues : 12
- CorrectiveAction Issues : 0
Change Issues
- 2024-05-13T04:06:48Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17994+
- 2024-05-10T15:27:37Z - [gstg] Gitaly Zonal Outage Game Day (production#17991 - closed)
- 2024-05-09T17:24:56Z - Remove former patroni leader nodes (production#17983 - closed)
- 2024-05-08T17:31:20Z - [GPRD] Disable allow_runner_registration_token ... (production#17981 - closed)
- 2024-05-08T08:17:10Z - [PRE] Remove old NFS environments (production#17978)
- 2024-05-08T06:59:00Z - [GPRD] Increase huge pages in C3-Highmem-127 Pa... (production#17977 - closed)
- 2024-05-07T21:14:30Z - 2024-05-07: Enable `silent_admin_exports_enable... (production#17976 - closed)
- 2024-05-07T20:08:30Z - Remove outdated omniauth providers (production#17975 - closed)
- 2024-05-07T19:46:41Z - Remove ousted omniauth providers (production#17974 - closed)
- 2024-05-07T18:05:13Z - 2024-05-20: Switch Grafana Dashboards to Mimir (production#17972 - closed)
- 2024-05-07T17:36:00Z - 2024-05-13: Switch periodic-thanos-queries to ... (production#17971 - closed)
- 2024-05-07T17:21:43Z - 2024-05-13: Switch Tamland to use Mimir (production#17970 - closed)
- 2024-05-07T08:17:40Z - [GSTG] Test out redis reconfigure script (production#17967 - closed)
Incident Issues
- 2024-05-11T03:28:33Z - 2024-05-11: incrase in errors for SaaS runner q... (production#17993 - closed) | severity3 | ~"Service::CiRunners" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17993
- 2024-05-11T00:41:24Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17992+ | severity4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17992
- 2024-05-10T07:35:38Z - 2024-05-10: GSTG One or more Gitaly storages ar... (production#17990 - closed) | severity3 | ~"Service::GprdPatroniMainV14" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17990
- 2024-05-10T04:23:17Z - 2024-05-10: dev.gitlab.org unreachable (production#17988 - closed) | severity3 | ServiceBlackbox |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17988
- 2024-05-09T21:00:17Z - 2024-05-09: AI Gateway Traffic Absent (production#17986 - closed) | severity4 | ~"Service::AiGateway" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17986
- 2024-05-09T19:01:49Z - 2024-05-09: WALG Backup failed for Patroni Main (production#17984 - closed) | severity4 | ServiceBlackbox |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17984
- 2024-05-09T04:42:48Z - 2024-05-09: Gitlab::InternalEvents.track_event ... (production#17982 - closed) | severity4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17982
- 2024-05-08T15:00:44Z - 2024-05-08: KubeServiceClusterScaleupsErrorSLOV... (production#17980 - closed) | severity3 | ServiceKube |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17980
- 2024-05-08T08:50:47Z - 2024-05-08: Page load time degradation and inte... (production#17979 - closed) | severity3 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17979
- 2024-05-07T18:59:59Z - 2024-05-07: migrations job failed on gstg-cny (production#17973 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17973
- 2024-05-07T16:29:41Z - 2024-05-07: Vault Public Ingress Apdex (production#17969 - closed) | severity4 | ServiceVault |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17969
- 2024-05-07T11:33:12Z - 2024-05-07: QA cannot create runners token - 17... (production#17968 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17968
CorrectiveAction Issues
- 2024-05-08T10:56:07Z - Remove no longer needed secrets from chef (production-engineering#25396)
Open Issue Stats
- Oncall issues : 1
- Change issues : 21
- Incident issues : 5
- Access Request : 0
- CorrectiveAction : 61
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
Created | Summary |
---|
Open Oncall Issues
Show/Hide Table
Created | Summary |
---|---|
2021-09-17T19:35:34Z | Proposal: When an Incident is declared, output ... (production-engineering#14205) |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net