Weekly Reliability (SRE) Team Newsletter – On-call Period: 2024-02-06 - 2024-02-13
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
Schedule | Username |
---|---|
SRE 8-hour Americas | Alex Hanselka |
SRE 8-hour Americas | Matt Smiley |
SRE 8-hour APAC | Devin Sylva |
SRE 8-hour APAC | Pierre Guinoiseau |
SRE 8-hour EMEA | Igor Wiedler |
SRE 8-hour EMEA | Maina Ng'ang'a |
SRE 8-hour EMEA | Calliope Gardner |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 5
- Incident Issues : 17
- CorrectiveAction Issues : 0
Change Issues
- 2024-02-12T06:55:53Z - 2024-02-12: Set arkose_labs_data_exchange_key a... (production#17567 - closed)
- 2024-02-08T22:51:18Z - Mo Khan - admin account on self managed test en... (production#17560 - closed)
- 2024-02-08T09:42:46Z - 2024-02-08: Enable search for next namespace ba... (production#17552 - closed)
- 2024-02-08T06:45:42Z - 2024-02-08: Set arkose_labs_data_exchange_key a... (production#17551 - closed)
- 2024-02-06T22:56:52Z - GPRD - Implement data checksums in a GPRD Main ... (production#17544 - closed)
Incident Issues
- 2024-02-11T15:25:06Z - 2024-02-11: workhorse_auth_api slo violation in... (production#17565 - closed) | severity3 | ServiceGit |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17565
- 2024-02-11T14:23:42Z - 2024-02-11: missing traffic on a Gitaly VM (fil... (production#17564 - closed) | severity4 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17564
- 2024-02-11T11:02:15Z - 2024-02-11: PostgreSQL queries dominating total... (production#17563 - closed) | | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17563
- 2024-02-09T18:57:01Z - 2024-02-09: Vault Istio Internal Ingress SLO vi... (production#17562 - closed) | severity3 | ServiceVault |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17562
- 2024-02-09T13:19:35Z - 2024-02-09: LoggingServiceFluentdLogOutputError... (production#17561 - closed) | severity3 | ServiceLogging |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17561
- 2024-02-08T18:25:28Z - 2024-02-08: web-cny Error Rate Violation (production#17558 - closed) | severity3 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17558
- 2024-02-08T17:07:24Z - 2024-02-08: Multiple services reporting no traffic (production#17557 - closed) | severity3 | ServiceStackdriver |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17557
- 2024-02-08T14:15:47Z - 2024-02-08: Redis::CommandError NOAUTH Authenti... (production#17553 - closed) | severity3 | ServiceNeeded |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17553
- 2024-02-08T04:48:49Z - 2024-02-08: data-server-rebuild-ansible pipelin... (production#17550 - closed) | severity3 | ServicePgbouncer |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17550
- 2024-02-07T23:23:32Z - 2024-02-07: Node postgres-ci-dr-delayed-v14-01-... (production#17548 - closed) | severity4 | ServicePostgresDelayed |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17548
- 2024-02-07T18:11:45Z - 2024-02-07: Web Pages apdex drop (production#17546 - closed) | severity3 | ~"Service::WebPages" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17546
- 2024-02-07T06:04:10Z - 2024-02-07: Gitaly Down on file-hdd-02-stor-gpr... (production#17545 - closed) | severity3 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17545
- 2024-02-06T19:04:38Z - 2024-02-06: AI Gateway `server` service apdex v... (production#17542 - closed) | severity4 | ServiceAIGateway |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17542
- 2024-02-06T19:01:32Z - 2024-02-06: Chef runs on runner managers segfau... (production#17541 - closed) | severity4 | ~"Service::AiGateway" |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17541
- 2024-02-06T17:44:21Z - 2024-02-06: SSL certificate for pages.gitlab.io... (production#17540 - closed) | | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17540
- 2024-02-06T17:44:21Z - 2024-02-06: SSL certificate for gitlab-examples... (production#17539 - closed) | | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17539
- 2024-02-06T15:31:26Z - 2024-02-06: Gitlab::Git::CommandTimedOut errors (production#17538 - closed) | severity3 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17538
CorrectiveAction Issues
- 2024-02-09T18:15:04Z - Make HAProxy restarts reliable and detect stale... (production-engineering#25047 - closed)
- 2024-02-07T21:40:59Z - pg_checksums packages - gitlab-pgchecksums and ... (production-engineering#25038 - closed)
Open Issue Stats
- Oncall issues : 2
- Change issues : 38
- Incident issues : 3
- Access Request : 0
- CorrectiveAction : 104
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
Created | Summary |
---|---|
2024-02-11T11:02:15Z | 2024-02-11: PostgreSQL queries dominating total... (production#17563 - closed) |
2024-01-31T09:36:14Z | 2024-01-31: password authentication failed for ... (production#17509 - closed) |
Open Oncall Issues
Show/Hide Table
Created | Summary |
---|---|
2021-09-17T19:35:34Z | Proposal: When an Incident is declared, output ... (production-engineering#14205) |
2020-12-18T22:29:14Z | CI clones fail for repositories with a path end... (production-engineering#12200 - moved) |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.