Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-03-15 - 2022-03-22

Announcements

Engineering Week in Review Highlights:

Team Updates


On-Call During This Period

Schedule Username
SRE 8-hour Americas Cameron McFarland
SRE 8-hour Americas Marcel Chacon
SRE 8-hour APAC Craig Barrett
SRE 8-hour EMEA Alejandro Rodriguez
SRE 8-hour EMEA Igor Wiedler

PagerDuty Incidents

See the 1 week report for acknowledged PD pages (long-term trend)

Alerts Volume

7 Day Issue Stats

  • Oncall issues : 0
  • Access Request : 0
  • Change Issues : 19
  • Incident Issues : 43
  • CorrectiveAction Issues : 0

Change Issues

Incident Issues

CorrectiveAction Issues

Open Issue Stats

Open Change Issues

Show/Hide Table
Created Summary
2022-03-18T19:25:49Z Removal of foreign key fk_e4ef9c2f27 on PRD
2022-03-17T17:28:34Z Import projects into project_build_artifacts_size_refreshes
2022-03-17T11:50:33Z Adjust batch_size, pause_ms and sub_batch_size of NullifyOrphanRunnerIdOnCiBuilds migration
2022-03-17T11:05:45Z Grow Elasticsearch cluster gitlab-logs-prod from 9 hot nodes to 11
2022-03-15T15:30:43Z [gprd] Replace redis-cache-sentinel instances after changing machine_type from n1-standard-1 to n2d-standard-4
2022-03-15T13:54:10Z 2022-03-15: Delete marketo hook

Open Incident Issues

Show/Hide Table
Created Summary
2022-03-18T17:39:32Z 2022-03-18: The goserver_op_service SLI of the gitaly service on node file-22-stor-gprd.c.gitlab-production.internal has an error rate violating SLO
2022-03-18T01:28:18Z 2022-03-18: Some notification emails are delayed
2022-03-17T05:58:53Z 2022-03-17: Commit via the API fails with error 500 during a QA test
2022-03-14T14:24:30Z 2022-03-13: Postgres pending WAL files on primary is high
2022-02-12T19:20:49Z 2022-02-12: Increased latency from us-east1-d for GCS buckets

Open Oncall Issues

Show/Hide Table
Created Summary
2021-09-17T19:35:34Z Proposal: When an Incident is declared, output the latest changed feature flags into the incident issue
2020-12-18T22:29:14Z CI clones fail for repositories with a path ending in a period
2020-03-30T13:38:11Z jobs.gitlab.com cert expired unnoticed on 2020-03-28

Issues for Review during Incident Review Meeting

If there are any incidents you think would be good to review, please add them to the Agenda for the next meeting.
Edited by Kennedy Wanyangu