Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-07-12 - 2022-07-19
Announcements
Engineering Week in Review Highlights:
Team Updates
On-Call During This Period
Schedule | Username |
---|---|
SRE 8-hour Americas | Cameron McFarland |
SRE 8-hour Americas | Marcel Chacon |
SRE 8-hour APAC | Cindy Pallares |
SRE 8-hour APAC | Filipe Santos |
SRE 8-hour EMEA | Alejandro Rodriguez |
SRE 8-hour EMEA | Michal Wasilewski |
PagerDuty Incidents
See the 1 week report for acknowledged PD pages (long-term trend)
Alerts Volume
7 Day Issue Stats
- Oncall issues : 0
- Access Request : 0
- Change Issues : 18
- Incident Issues : 26
- CorrectiveAction Issues : 1
Change Issues
- 2022-07-16T00:05:11Z - Migrate large projects off file-60, file-61, fi... (production#7457 - closed)
- 2022-07-15T19:31:22Z - Migrate large projects off file-52, file-53, an... (production#7454 - closed)
- 2022-07-15T15:25:03Z - Set correct registry db name in gitlab.rb for gprd (production#7448 - closed)
- 2022-07-14T16:30:27Z - Rebuild postgres-ci-dr-archive-01-db-gprd and p... (production#7440 - closed)
- 2022-07-14T08:25:02Z - Drop index_ci_builds_on_queued_at index from ci... (production#7435 - closed)
- 2022-07-14T08:24:36Z - [Staging] Drop index_ci_builds_on_queued_at ind... (production#7434 - closed)
- 2022-07-13T20:10:46Z - Bump the default nodeselector gke-nodepools in ... (production#7433 - closed)
- 2022-07-13T14:17:36Z - Elasticsearch delete search-team-monitoring-clu... (production#7431 - closed)
- 2022-07-13T08:28:24Z - 2022-07-13: Enable memory watchdog in pre-prod ... (production#7428 - closed)
- 2022-07-13T04:20:29Z - 2022-07-14: Bump `gitlab-exporters` cookbook fo... (production#7426 - closed)
- 2022-07-12T19:36:03Z - Bump the sidekiq nodeselector gke-nodepools (production#7425 - closed)
- 2022-07-12T17:42:02Z - Bump the default nodeselector gke-nodepools (production#7424 - closed)
- 2022-07-12T15:25:49Z - 2022-07-12: Bump `mirror_max_capacity` for pull... (production#7422 - closed)
- 2022-07-12T11:21:15Z - [GPRD] Align `data_disk_sizes` of the `patroni-... (production#7419 - closed)
- 2022-07-12T10:21:17Z - [GSTG] Drop index_ci_builds_on_token_partial in... (production#7417 - closed)
- 2022-07-12T07:40:15Z - [Staging] Drop index_ci_builds_on_project_id_fo... (production#7415 - closed)
- 2022-07-12T07:22:55Z - 2022-07-12: Merge cny and main pages into 1 (production#7414 - closed)
- 2022-07-11T16:12:03Z - [08/02/2022 - 11:00 UTC] - Patroni Clusters OS ... (production#7413 - closed)
Incident Issues
- 2022-07-18T01:45:03Z - 2022-07-18: Block deployments due to broken Env... (production#7461 - closed) | reliability~3760141 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7461
- 2022-07-17T21:32:15Z - 2022-07-17: file-51 high error ratio (production#7460 - closed) | reliability~3760141 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7460
- 2022-07-17T11:22:09Z - 2022-07-17: PrometheusUnreachable for prometheu... (production#7459 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7459
- 2022-07-16T12:52:22Z - 2022-07-16: The goserver SLI of the gitaly serv... (production#7458 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7458
- 2022-07-15T20:12:54Z - 2022-07-15: GitLab.com is down/degraded (production#7456 - closed) | reliability~3760139 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7456
- 2022-07-15T20:05:36Z - 2022-07-15: Teleport instance unresponsive (production#7455 - closed) | reliability~3760141 | ServiceInfrastructure |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7455
- 2022-07-15T19:20:24Z - 2022-07-15: Gitaly "Conflict Side Missing" erro... (production#7452 - closed) | reliability~3760141 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7452
- 2022-07-15T19:05:45Z - 2022-07-15: Gitaly deployment failing on stagin... (production#7451 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7451
- 2022-07-15T13:16:10Z - 2022-07-15: The grafana_datasources SLI of the ... (production#7447 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7447
- 2022-07-15T11:55:27Z - 2022-07-15: The goserver SLI of the gitaly serv... (production#7446 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7446
- 2022-07-15T10:38:29Z - 2022-07-15: MonitoringServiceRuleEvaluationTraf... (production#7445 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7445
- 2022-07-15T09:38:02Z - 2022-07-15: The goserver SLI of the gitaly serv... (production#7444 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7444
- 2022-07-15T01:30:21Z - 2022-07-15: increased primary DB queries latency (production#7443 - closed) | reliability~3760141 | ServicePostgres |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7443
- 2022-07-14T15:15:19Z - 2022-07-14: Web pages are not loading properly (production#7438 - closed) | reliability~3760140 | ServicePages |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7438
- 2022-07-14T13:09:05Z - 2022-07-14: Group's "CI/CD > Runners" page retu... (production#7437 - closed) | reliability~3760140 | ServiceAPI |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7437
- 2022-07-14T12:58:36Z - 2022-07-14: LoggingVisibilityDiminished (production#7436 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7436
- 2022-07-13T16:06:44Z - 2022-07-13: The sshServices SLI of the frontend... (production#7432 - closed) | reliability~3760141 | ServiceFrontend |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7432
- 2022-07-13T13:45:01Z - 2022-07-13: ProjectExportWorker not running on ... (production#7430 - closed) | reliability~3760142 | ServiceSidekiq |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7430
- 2022-07-13T13:30:11Z - 2022-07-13: GitalyServiceGoserverApdexSLOViolation (production#7429 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7429
- 2022-07-13T08:04:03Z - 2022-07-13: main/canary staging QA jobs are fai... (production#7427 - closed) | reliability~3760141 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7427
- 2022-07-12T15:39:17Z - 2022-07-12: PrometheusUnreachable alerts for pr... (production#7423 - closed) | reliability~3760141 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7423
- 2022-07-12T15:17:24Z - 2022-07-12: Prometheus volume space saturated (production#7421 - closed) | reliability~3760141 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7421
- 2022-07-12T14:49:09Z - 2022-07-12: RepositoryUpdateMirrorWorker seems ... (production#7420 - closed) | reliability~3760141 | ServiceSidekiq |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7420
- 2022-07-12T10:37:19Z - 2022-07-12: PrometheusUnreachable for prometheu... (production#7418 - closed) | reliability~3760141 | ServicePrometheus |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7418
- 2022-07-12T08:49:27Z - 2022-07-12: RepositoryUpdateMirrorWorker seems ... (production#7416 - closed) | reliability~3760141 | ServiceSidekiq |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7416
- 2022-07-11T15:21:12Z - 2022-07-11: Automated On-Call Handover Issue no... (production#7412 - closed) | reliability~3760142 | ServiceWoodhouse |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7412
CorrectiveAction Issues
- 2022-07-12T16:40:06Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16033+
- 2022-07-12T16:25:19Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16032+
- 2022-07-12T14:34:48Z - https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16028+
Open Issue Stats
- Oncall issues : 2
- Change issues : 13
- Incident issues : 12
- Access Request : 0
- CorrectiveAction : 98
Open Change Issues
Show/Hide Table
Open Incident Issues
Show/Hide Table
Created | Summary |
---|---|
2022-07-18T01:45:03Z | 2022-07-18: Block deployments due to broken Env... (production#7461 - closed) |
2022-07-17T21:32:15Z | 2022-07-17: file-51 high error ratio (production#7460 - closed) |
2022-07-17T11:22:09Z | 2022-07-17: PrometheusUnreachable for prometheu... (production#7459 - closed) |
2022-07-16T12:52:22Z | 2022-07-16: The goserver SLI of the gitaly serv... (production#7458 - closed) |
2022-07-15T13:16:10Z | 2022-07-15: The grafana_datasources SLI of the ... (production#7447 - closed) |
2022-07-15T11:55:27Z | 2022-07-15: The goserver SLI of the gitaly serv... (production#7446 - closed) |
2022-07-15T10:38:29Z | 2022-07-15: MonitoringServiceRuleEvaluationTraf... (production#7445 - closed) |
2022-07-15T09:38:02Z | 2022-07-15: The goserver SLI of the gitaly serv... (production#7444 - closed) |
2022-07-14T12:58:36Z | 2022-07-14: LoggingVisibilityDiminished (production#7436 - closed) |
2022-06-27T23:14:54Z | 2022-06-27: Small uptick in TLS handshake failu... (production#7337 - closed) |
Open Oncall Issues
Show/Hide Table
Created | Summary |
---|---|
2021-09-17T19:35:34Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205+ |
2020-12-18T22:29:14Z | https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200+ |
Issues for Review during Incident Review Meeting
If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by ops-gitlab-net