Weekly Reliability (SRE) Team Newsletter – On-call Period: 2023-05-16 - 2023-05-23
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
| Schedule | Username |
|---|---|
| SRE 8-hour Americas | Alejandro Rodríguez |
| SRE 8-hour Americas | Marcel Chacon |
| SRE 8-hour APAC | Gonzalo Servat |
| SRE 8-hour APAC | Furhan Shabir |
| SRE 8-hour EMEA | Ahmad Sherif |
| SRE 8-hour EMEA | Steve Azzopardi |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 11
- Incident Issues : 21
- CorrectiveAction Issues : 0
Change Issues
- 2023-05-19T20:50:31Z - [CR] [gprd] Switch 100% to new HAProxy 2.6 (production#14475 - closed)
- 2023-05-19T18:22:55Z - [CR] [gprd] Switch to new HAProxy 2.6 CI intern... (production#14473 - closed)
- 2023-05-19T18:21:29Z - [CR] [gprd] Switch to new HAProxy 2.6 internal ... (production#14472 - closed)
- 2023-05-19T18:20:14Z - [CR] [gprd] Send a small percentage of traffic ... (production#14471 - closed)
- 2023-05-19T01:01:54Z - [CR] [gprd] Creating node pools and load balanc... (production#14461 - closed)
- 2023-05-18T20:15:01Z - 2023-05-18: Invalidate cloudfront cache for pac... (production#14457 - closed)
- 2023-05-18T18:24:29Z - https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14456+
- 2023-05-18T10:34:27Z - [GPRD] Complete ci_builds partitioning - 2nd at... (production#14453 - closed)
- 2023-05-18T08:07:10Z - [GPRD] Install pg_stat_kcache package in DR pos... (production#14449 - closed)
- 2023-05-17T08:23:17Z - [GPRD] [2023-08-26] - Upgrade PostgreSQL to PG1... (production#14403 - closed)
- 2023-05-17T08:20:25Z - [GPRD] [2023-06-24 14:00 UTC -19:00 UTC] - Upgr... (production#14402 - closed)
Incident Issues
- 2023-05-22T00:45:36Z - 2023-05-22: Traffic Cessation Alerts (KasServic... (production#14493 - closed) | severity4 | ServiceMonitoring-Other |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14493 - 2023-05-21T10:39:18Z - 2023-05-21: WebServiceLoadbalancerErrorSLOViola... (production#14485 - closed) | severity4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14485 - 2023-05-21T04:33:13Z - 2023-05-21: WebServiceLoadbalancerErrorSLOViola... (production#14484 - closed) | severity4 | ServiceAPI |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14484 - 2023-05-19T19:23:08Z - 2023-05-19: ActionController::UrlGenerationError (production#14474 - closed) | severity4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14474 - 2023-05-19T15:23:00Z - 2023-05-19: Groups inaccessible and 500 errors (production#14469 - closed) | severity2 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14469 - 2023-05-19T14:48:19Z - 2023-05-19: Code suggestions is fully down (production#14468 - closed) | severity3 | ServiceCodeSuggestions |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14468 - 2023-05-19T08:01:38Z - 2023-05-19: Alertmanager Notifications Failing ... (production#14467 - closed) | severity4 | ServiceAlertManager |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14467 - 2023-05-19T03:39:36Z - 2023-05-19: GitalyServiceGoserverErrorSLOViolat... (production#14462 - closed) | severity4 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14462 - 2023-05-18T13:29:21Z - 2023-05-18: Increased latency for code suggestions (production#14455 - closed) | severity3 | ServiceCodeSuggestions |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14455 - 2023-05-18T11:39:00Z - 2023-05-18: PVS Apdex violation (production#14454 - closed) | severity4 | ServicePVS |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14454 - 2023-05-18T09:43:07Z - 2023-05-18: Code suggestions service is down (production#14451 - closed) | severity3 | ServiceCodeSuggestions |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14451 - 2023-05-17T19:54:50Z - 2023-05-17: Expired certificate for int.gprd.gi... (production#14422 - closed) | severity1 | ServiceInfrastructure |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14422 - 2023-05-17T19:12:17Z - 2023-05-17: Release candidate deployment to pre... (production#14421 - closed) | severity3 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14421 - 2023-05-17T19:10:46Z - 2023-05-17: Release candidate deployment to pre... (production#14420 - closed) | severity3 | ServiceDeployTooling |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14420 - 2023-05-17T15:19:15Z - 2023-05-17: PubSub queuing high - pubsub-rails-... (production#14407 - closed) | severity4 | ServicePubSub |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14407 - 2023-05-17T12:12:09Z - 2023-05-17: RedisCache Client error (production#14406 - closed) | severity3 | ServiceRedisCache |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14406 - 2023-05-17T09:02:11Z - 2023-05-17: High error rate in Git (production#14405 - closed) | severity3 | ServiceGit |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14405 - 2023-05-17T09:02:09Z - 2023-05-17: Alertmanager is seeing errors for s... (production#14404 - closed) | severity4 | ServiceAlertManager |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14404 - 2023-05-17T03:02:30Z - 2023-05-17: Unable to create repo in pre.gitlab... (production#14397 - closed) | severity4 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14397 - 2023-05-16T16:05:28Z - 2023-05-16: Data-Server Rebuild Ansible | Faile... (production#14370 - closed) | severity4 | ServicePatroni |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14370 - 2023-05-16T15:37:24Z - 2023-05-16: PubSub queuing high - pubsub-rails-... (production#14369 - closed) | severity3 | ServicePubSub |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/14369
CorrectiveAction Issues
- 2023-05-22T06:27:40Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23738+
- 2023-05-19T01:20:59Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23719+
- 2023-05-18T11:17:36Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23713+
- 2023-05-18T01:20:35Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23709+
- 2023-05-17T22:37:51Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23708+
- 2023-05-17T21:18:58Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23707+
- 2023-05-17T14:35:24Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23701+
Open Issue Stats
- Oncall issues : 2
- Change issues : 34
- Incident issues : 7
- Access Request : 0
- CorrectiveAction : 105
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
| Created | Summary |
|---|
Open Oncall Issues
Show/Hide Table
| Created | Summary |
|---|---|
| 2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
| 2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net