Weekly Reliability (SRE) Team Newsletter – On-call Period: 2022-03-15 - 2022-03-22
<!-- This issue was automatically generated by https://gitlab.com/gitlab-com/gl-infra/oncall-robot-assistant. -->
<!-- Announcements common to all the Reliability (SRE) Teams should be placed in this section. -->
# Announcements
#### [Engineering Week in Review](https://docs.google.com/document/d/1GQbnOP_lr9KVMVaBQx19WwKITCmh7H3YlgO-XqVwv0M/edit#) Highlights:
<!-- Announcements for each individual SRE Team should be made in their respective sections below. -->
# Team Updates
<!-- xxYZzXcV -->
---
# On-Call During This Period
| Schedule | Username |
| -------- | -------- |
| SRE 8-hour Americas | Cameron McFarland |
| SRE 8-hour Americas | Marcel Chacon |
| SRE 8-hour APAC | Craig Barrett |
| SRE 8-hour EMEA | Alejandro Rodriguez |
| SRE 8-hour EMEA | Igor Wiedler |
## PagerDuty Incidents
[See the 1 week report for acknowledged PD pages](https://nonprod-log.gitlab.net/app/dashboards#/view/dacc1d40-1c64-11ec-b8fd-b5d052b1f8cb?_g=(time:(from:'2022-03-15T02:00:00Z',to:'2022-03-22T02:00:00Z'),filters:!((query:(match_phrase:(type.keyword:pagerduty))),(query:(match_phrase:(status.keyword:triggered)))))) ([long-term trend](https://nonprod-log.gitlab.net/goto/a436702d57864666cd0c8867cfaf73e9))
### Alerts Volume
- [Weekly Trend](https://nonprod-log.gitlab.net/app/visualize#/create?type=line&indexPattern=b35d9ca0-6c67-11eb-968b-c18082d502f4&_g=(filters:!(('$state':(store:globalState),meta:(alias:!n,disabled:!f,index:b35d9ca0-6c67-11eb-968b-c18082d502f4,key:status,negate:!f,params:(query:acknowledged),type:phrase),query:(match_phrase:(status:acknowledged)))),refreshInterval:(pause:!t,value:0),time:(from:now-1w,to:now-1d))&_a=(filters:!(),linked:!f,query:(language:kuery,query:''),uiState:(vis:(legendOpen:!f)),vis:(aggs:!((enabled:!t,id:'1',params:(),schema:metric,type:count),(enabled:!t,id:'2',params:(drop_partials:!f,extended_bounds:(),field:time,interval:d,min_doc_count:1,scaleMetricValues:!f,timeRange:(from:now-7M,to:'2021-09-09T00:00:00.000Z'),useNormalizedEsInterval:!t),schema:segment,type:date_histogram)),params:(addLegend:!t,addTimeMarker:!f,addTooltip:!t,categoryAxes:!((id:CategoryAxis-1,labels:(filter:!t,show:!t,truncate:100),position:bottom,scale:(type:linear),show:!t,style:(),title:(),type:category)),grid:(categoryLines:!f),labels:(),legendPosition:right,seriesParams:!((data:(id:'1',label:Count),drawLinesBetweenPoints:!t,interpolate:linear,lineWidth:2,mode:stacked,show:!t,showCircles:!t,type:line,valueAxis:ValueAxis-1)),thresholdLine:(color:%23E7664C,show:!f,style:full,value:10,width:1),times:!(),type:line,valueAxes:!((id:ValueAxis-1,labels:(filter:!f,rotate:0,show:!t,truncate:100),name:LeftAxis-1,position:left,scale:(mode:normal,type:linear),show:!t,style:(),title:(text:Count),type:value))),title:'',type:line)))
- [Monthly Trend](https://nonprod-log.gitlab.net/app/visualize#/create?type=line&indexPattern=b35d9ca0-6c67-11eb-968b-c18082d502f4&_g=(filters:!(('$state':(store:globalState),meta:(alias:!n,disabled:!f,index:b35d9ca0-6c67-11eb-968b-c18082d502f4,key:status,negate:!f,params:(query:acknowledged),type:phrase),query:(match_phrase:(status:acknowledged)))),refreshInterval:(pause:!t,value:0),time:(from:now-1M,to:now-1d))&_a=(filters:!(),linked:!f,query:(language:kuery,query:''),uiState:(vis:(legendOpen:!f)),vis:(aggs:!((enabled:!t,id:'1',params:(),schema:metric,type:count),(enabled:!t,id:'2',params:(drop_partials:!f,extended_bounds:(),field:time,interval:d,min_doc_count:1,scaleMetricValues:!f,timeRange:(from:now-7M,to:'2021-09-09T00:00:00.000Z'),useNormalizedEsInterval:!t),schema:segment,type:date_histogram)),params:(addLegend:!t,addTimeMarker:!f,addTooltip:!t,categoryAxes:!((id:CategoryAxis-1,labels:(filter:!t,show:!t,truncate:100),position:bottom,scale:(type:linear),show:!t,style:(),title:(),type:category)),grid:(categoryLines:!f),labels:(),legendPosition:right,seriesParams:!((data:(id:'1',label:Count),drawLinesBetweenPoints:!t,interpolate:linear,lineWidth:2,mode:stacked,show:!t,showCircles:!t,type:line,valueAxis:ValueAxis-1)),thresholdLine:(color:%23E7664C,show:!f,style:full,value:10,width:1),times:!(),type:line,valueAxes:!((id:ValueAxis-1,labels:(filter:!f,rotate:0,show:!t,truncate:100),name:LeftAxis-1,position:left,scale:(mode:normal,type:linear),show:!t,style:(),title:(text:Count),type:value))),title:'',type:line)))
- [90 days trend by service](https://nonprod-log.gitlab.net/goto/a436702d57864666cd0c8867cfaf73e9)
### 7 Day Issue Stats
* Oncall issues : **0**
* Access Request : **0**
* Change Issues : **19**
* Incident Issues : **43**
* CorrectiveAction Issues : **0**
#### Change Issues
* 2022-03-21T00:02:14Z - [2022-03-21: Grow gitlab-logs-prod warm tier](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6656)
* 2022-03-20T23:54:52Z - [2022-03-21: Enable autoscaling for pubsubbeat](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6655)
* 2022-03-18T19:25:49Z - [Removal of foreign key fk_e4ef9c2f27 on PRD](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6646)
* 2022-03-18T19:19:21Z - [Manually mark migration as complete to fix deploy](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6645)
* 2022-03-18T11:55:06Z - [Set up ops staging environment](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6637)
* 2022-03-17T17:28:34Z - [Import projects into project_build_artifacts_size_refreshes](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6631)
* 2022-03-17T13:31:25Z - [2022-03-18: Upgrade Prometheus servers in gprd GKE Clusters](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6627)
* 2022-03-17T11:50:33Z - [Adjust batch_size, pause_ms and sub_batch_size of NullifyOrphanRunnerIdOnCiBuilds migration](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6626)
* 2022-03-17T11:05:45Z - [Grow Elasticsearch cluster gitlab-logs-prod from 9 hot nodes to 11](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6624)
* 2022-03-16T22:51:46Z - [2022-03-16: Add PVCs to Alertmanager](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6620)
* 2022-03-16T20:52:42Z - [[GPRD] - Further increase the number of concurrently archived WAL files to mitigate pileup (15 => 20)](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6618)
* 2022-03-16T17:47:32Z - [[gstg] Drain and reboot each frontend service member instance one at a time](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6617)
* 2022-03-16T16:30:48Z - [[gprd] Drain and reboot each frontend service member instance one at a time](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6616)
* 2022-03-15T15:30:43Z - [[gprd] Replace `redis-cache-sentinel` instances after changing `machine_type` from `n1-standard-1` to `n2d-standard-4`](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6602)
* 2022-03-15T14:32:59Z - [[GPRD] - Increase the number of concurrently archived WAL files to mitigate pileup (10 => 15)](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6599)
* 2022-03-15T14:24:49Z - [[GSTG] Reprovision HAProxy with a single NIC](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6598)
* 2022-03-15T13:54:10Z - [2022-03-15: Delete marketo hook](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6597)
* 2022-03-15T13:17:03Z - [2022-03-15: Disable api-gke-us-east1-d on gstg](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6595)
* 2022-03-15T05:31:17Z - [2022-03-15: Increase zip cache expiration for pages](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6592)
#### Incident Issues
* 2022-03-21T03:02:31Z - [2022-03-21: Multiple pages SLI, pingdom, blackbox alerts](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6657) | reliability~3760141 | ~"Service::Pages" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6657`
* 2022-03-20T15:28:36Z - [2022-03-20: Blackbox probes for https://customers.gitlab.com are failing.](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6654) | reliability~3760141 | ~"Service::CustomersDot" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6654`
* 2022-03-20T07:58:28Z - [2022-03-20 Salesforce authentication failing in CustomersDot production](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6653) | reliability~3760140 | ~"Service::CustomersDot" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6653`
* 2022-03-20T06:45:04Z - [2022-03-20: The sshServices SLI of the frontend service (`main` stage) has an apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6652) | reliability~3760140 | ~"Service::Frontend" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6652`
* 2022-03-19T10:29:37Z - [2022-03-19: The goserver SLI of the gitaly service on node `file-hdd-01-stor-gprd.c.gitlab-production.internal` has an apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6651) | reliability~3760141 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6651`
* 2022-03-19T10:21:19Z - [2022-03-19: The goserver_op_service SLI of the gitaly service on node `file-cny-01-stor-gprd.c.gitlab-production.internal` has not received any traffic in the past 30m](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6650) | reliability~3760142 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6650`
* 2022-03-18T23:43:49Z - [2022-03-18: Number of Gitaly shards (for new repositories) is low](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6649) | reliability~3760142 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6649`
* 2022-03-18T18:47:40Z - [2022-03-18: QA failures on gstg-cny](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6643) | reliability~3760140 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6643`
* 2022-03-18T18:36:10Z - [2022-03-18: Post Deploy migrations Failure on Auto-Deploy](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6642) | reliability~3760140 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6642`
* 2022-03-18T17:39:32Z - [2022-03-18: The goserver_op_service SLI of the gitaly service on node `file-22-stor-gprd.c.gitlab-production.internal` has an error rate violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6641) | reliability~3760141 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6641`
* 2022-03-18T15:21:15Z - [2022-03-18: The loadbalancer SLI of the web-pages service in region `us-east` has an error rate violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6640) | reliability~3760141 | ~"Service::Pages" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6640`
* 2022-03-18T13:44:46Z - [2022-03-18: CloudSqlServiceCloudsqlTransactionsErrorSLOViolation](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6639) | reliability~3760142 | ~"Service::Grafana" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6639`
* 2022-03-18T04:42:15Z - [2022-03-18: CustomersDot: 500 error when attempting to change linked namespace](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6635) | reliability~3760142 | ~"Service::CustomersDot" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6635`
* 2022-03-18T01:57:55Z - [2022-03-18: The sentry_events SLI of the sentry service (`main` stage) has an apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6634) | reliability~3760141 | ~"Service::Sentry" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6634`
* 2022-03-18T01:28:18Z - [2022-03-18: Some notification emails are delayed](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6633) | reliability~3760141 | ~"Service::GitLab Rails" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6633`
* 2022-03-17T20:07:19Z - [2022-03-17: Postgres Replication lag is over 9 hours on delayed replica (normal is 8 hours)](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6632) | reliability~3760142 | ~"Service::Postgres" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6632`
* 2022-03-17T16:42:24Z - [2022-03-17: Multiple versions of Gitaly have been running alongside one another](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6630) | reliability~3760140 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6630`
* 2022-03-17T16:22:07Z - [2022-03-17: QA gprd-cny smoke failure](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6629) | reliability~3760140 | ~"Service::Unknown" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6629`
* 2022-03-17T11:12:03Z - [2022-03-17: Site wide performance degradation](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6625) | reliability~3760139 | ~"Service::Postgres" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6625`
* 2022-03-17T05:58:53Z - [2022-03-17: Commit via the API fails with error 500 during a QA test](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6623) | reliability~3760141 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6623`
* 2022-03-17T05:32:26Z - [2022-03-17: Unable to load branches on Gitlab project](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6622) | reliability~3760140 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6622`
* 2022-03-16T21:57:23Z - [2022-03-16: The imagescaler SLI of the web service in region `us-east1-d` has an apdex violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6619) | reliability~3760141 | ~"Service::Workhorse" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6619`
* 2022-03-16T14:33:53Z - [2022-03-16: The Horizontal Pod Autoscaler Desired Replicas resource of the sidekiq service (main stage) has a saturation exceeding SLO and is close to its capacity limit.](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6615) | reliability~3760141 | ~"Service::Sidekiq" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6615`
* 2022-03-16T13:47:32Z - [2022-03-16: Postgres primary log disk filled up](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6614) | reliability~3760141 | ~"Service::Patroni" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6614`
* 2022-03-16T13:37:03Z - [2022-03-16: Reports of SSL certificate problem: unable to get local issuer certificate for some CI jobs](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6613) | reliability~3760142 | ~"Service::CI Runners" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6613`
* 2022-03-16T12:24:07Z - [2022-03-16: gitaly server not available for gitlab-org/gitlab repository](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6612) | reliability~3760142 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6612`
* 2022-03-16T09:46:16Z - [2022-03-16: Prometheus on GKE stgsub-customers-gke has gone missing](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6611) | reliability~3760142 | ~"Service::Prometheus" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6611`
* 2022-03-16T09:31:58Z - [2022-03-16: specs_without_cluster failing and preventing deployments](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6610) | reliability~3760141 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6610`
* 2022-03-15T21:19:50Z - [2022-03-15: PostgreSQL queries dominating total query time](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6609) | reliability~3760140 | ~"Service::Postgres" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6609`
* 2022-03-15T18:45:14Z - [2022-03-15 OpenSSL vulnerability for CVE-2022-0778](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6605) | reliability~3760140 | ~"Service::Web" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6605`
* 2022-03-15T15:09:46Z - [2022-03-15: PubSub queuing high](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6601) | reliability~3760141 | ~"Service::Logging" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6601`
* 2022-03-15T14:55:24Z - [2022-03-15: The Cloud NAT Gateway Port Allocation resource of the nat service (main stage) has a saturation exceeding SLO and is close to its capacity limit](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6600) | reliability~3760141 | ~"Service::NAT" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6600`
* 2022-03-15T13:41:20Z - [2022-03-15: Missing objects in gitlab-org/gitlab](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6596) | reliability~3760140 | ~"Service::Gitaly" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6596`
* 2022-03-15T12:40:33Z - [2022-03-15: Brief spike in artifact upload failures due to runner configuration change](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6594) | reliability~3760141 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6594`
* 2022-03-15T02:46:01Z - [2022-03-15 The GitLab job clone resource zlonk.datalytics.dailyx has failed.](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6591) | reliability~3760142 | | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6591`
* 2022-03-14T20:31:52Z - [2022-03-14: The loadbalancer SLI of the web-pages service in region `us-east` has an error rate violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6589) | reliability~3760141 | ~"Service::Pages" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6589`
* 2022-03-14T17:23:51Z - [2022-03-14: Gitlab.com issues with async jobs](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6586) | reliability~3760139 | ~"Service::Frontend" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6586`
* 2022-03-14T16:46:47Z - [2022-03-14 - Users Blocked from Gitlab.com by Cloudflare DDoS Page](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6585) | reliability~3760141 | ~"Service::Cloudflare" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6585`
* 2022-03-14T15:58:32Z - [2022-03-14: PubSub queuing high](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6583) | reliability~3760141 | ~"Service::Logging" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6583`
* 2022-03-14T14:24:30Z - [2022-03-13: Postgres pending WAL files on primary is high](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6581) | reliability~3760141 | ~"Service::Postgres" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6581`
* 2022-03-14T12:08:12Z - [2022-03-14: The loadbalancer SLI of the pages service (`main` stage) has an error rate violating SLO](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6580) | reliability~3760141 | ~"Service::Pages" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6580`
* 2022-03-14T11:21:44Z - [2022-03-14: PubSub queuing high](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6579) | reliability~3760141 | ~"Service::Logging" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6579`
* 2022-03-14T07:13:23Z - [2022-03-14: Containers for the `monitoring` service, `main` are unable to unable to start.](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6577) | reliability~3760141 | ~"Service::Monitoring-Other" | `https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6577`
#### CorrectiveAction Issues
* 2022-03-18T23:38:42Z - [Add new CI/CD Limits per 2022-03-14 Incident](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15476)
* 2022-03-18T23:30:16Z - [Add note in gitaly-weights-assigner that it has assigned 0% to too many nodes](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15475)
* 2022-03-18T14:22:01Z - [Deduplicate grafana SQL alerts](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15473)
* 2022-03-18T14:17:57Z - [Push grafana logs to loging cluster](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15472)
* 2022-03-16T18:48:48Z - [Corrective action: The Horizontal Pod Autoscaler Desired Replicas resource of the sidekiq service (main stage) has a saturation exceeding SLO and is close to its capacity limit.](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15456)
* 2022-03-16T06:32:40Z - [Configure Horizontal Pod Autoscaling for pubsubbeat deployments based on PubSub metrics](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15451)
* 2022-03-15T10:40:29Z - [Corrective action: foo](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15440)
* 2022-03-14T20:42:30Z - [Can we rate-limit self-made API calls to gitlab.com](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15436)
* 2022-03-14T20:05:52Z - [Webhook Destroy work should not be in catchall](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15435)
* 2022-03-14T19:56:28Z - [Update Ingress Allow Lists for gitlab.com](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15434)
* 2022-03-14T14:20:24Z - [Enforce rate limits on TLS connections for GitLab Pages](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15441)
### Open Issue Stats
* [Oncall issues](https://gitlab.com/gitlab-com/infrastructure/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=oncall) : **3**
* [Change issues](https://gitlab.com/gitlab-com/production/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=change) : **6**
* [Incident issues](https://gitlab.com/gitlab-com/production/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=incident) : **19**
* [Access Request](https://gitlab.com/gitlab-com/infrastructure/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=access%20request) : **0**
* [CorrectiveAction](https://gitlab.com/gitlab-com/infrastructure/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=corrective%20action) : **99**
#### Open Change Issues
<details>
<summary>Show/Hide Table</summary>
| Created | Summary |
| ------- | ------- |
| [2022-03-18T19:25:49Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6646) | Removal of foreign key fk_e4ef9c2f27 on PRD |
| [2022-03-17T17:28:34Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6631) | Import projects into project_build_artifacts_size_refreshes |
| [2022-03-17T11:50:33Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6626) | Adjust batch_size, pause_ms and sub_batch_size of NullifyOrphanRunnerIdOnCiBuilds migration |
| [2022-03-17T11:05:45Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6624) | Grow Elasticsearch cluster gitlab-logs-prod from 9 hot nodes to 11 |
| [2022-03-15T15:30:43Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6602) | [gprd] Replace `redis-cache-sentinel` instances after changing `machine_type` from `n1-standard-1` to `n2d-standard-4` |
| [2022-03-15T13:54:10Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6597) | 2022-03-15: Delete marketo hook |
</details>
#### Open Incident Issues
<details>
<summary>Show/Hide Table</summary>
| Created | Summary |
| ------- | ------- |
| [2022-03-18T17:39:32Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6641) | 2022-03-18: The goserver_op_service SLI of the gitaly service on node `file-22-stor-gprd.c.gitlab-production.internal` has an error rate violating SLO |
| [2022-03-18T01:28:18Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6633) | 2022-03-18: Some notification emails are delayed |
| [2022-03-17T05:58:53Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6623) | 2022-03-17: Commit via the API fails with error 500 during a QA test |
| [2022-03-14T14:24:30Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6581) | 2022-03-13: Postgres pending WAL files on primary is high |
| [2022-02-12T19:20:49Z](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6338) | 2022-02-12: Increased latency from us-east1-d for GCS buckets |
</details>
#### Open Oncall Issues
<details>
<summary>Show/Hide Table</summary>
| Created | Summary |
| ------- | ------- |
| [2021-09-17T19:35:34Z](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14205) | Proposal: When an Incident is declared, output the latest changed feature flags into the incident issue |
| [2020-12-18T22:29:14Z](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/12200) | CI clones fail for repositories with a path ending in a period |
| [2020-03-30T13:38:11Z](https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/9660) | jobs.gitlab.com cert expired unnoticed on 2020-03-28 |
</details>
#### Issues for Review during Incident Review Meeting
<details>
If there are any incidents you think would be good to review, please add them to the [Agenda](https://docs.google.com/document/d/1Llm9tXHC2dNt_eercRUUXlUyWmOVw00wmXWQQbWvv2c/edit?usp=sharing) for the next meeting.
</details>
issue