OnCall report for period: 2020-05-19 - 2020-05-26
Oncall during this period
Schedule | Username |
---|---|
SRE 8 Hour | Cindy Pallares |
SRE 8 Hour | Devin Sylva |
SRE 8 Hour | Ben Kochie |
SRE 8 Hour | Graeme Gillies |
PagerDuty Incidents
* Number of incidents: **21**
Show/Hide Table
Created | Summary |
---|---|
2020-05-19T08:22:55Z | [20923] 2020-05-19: Remove cloudflare-sslmate integration |
2020-05-19T10:14:05Z | [20931] Firing 1 - staging.GitLab.com is down for 30 minutes |
2020-05-19T10:14:06Z | [20932] Firing 1 - staging.GitLab.com is down for 30 minutes |
2020-05-19T15:51:20Z | [20935] Firing 1 - HAProxy process high CPU usage on fe-registry-02-lb-gprd.c.gitlab-production.internal |
2020-05-19T15:55:42Z | [20936] Firing 1 - postgres-dr-delayed-01-db-gprd.c.gitlab-production.internal postgres service appears down |
2020-05-19T17:00:40Z | [20939] Firing 1 - Multiple versions of Gitaly have been running alongside one another |
2020-05-20T02:19:58Z | [20948] Firing 1 - The patroni service (main stage) has a apdex score (latency) below SLO |
2020-05-20T11:19:06Z | [20954] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/ is down |
2020-05-20T11:19:26Z | [20955] Firing 1 - Gitaly is down on file-praefect-01-stor-gprd.c.gitlab-production.internal |
2020-05-20T11:20:53Z | [20956] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/tree/master is down |
2020-05-20T11:21:30Z | [20957] Site availability issues |
2020-05-20T16:20:50Z | [20961] Firing 1 - HAProxy process high CPU usage on fe-registry-01-lb-gprd.c.gitlab-production.internal |
2020-05-20T18:00:58Z | [20969] Firing 1 - The sidekiq service (main stage) has a apdex score (latency) below SLO |
2020-05-21T01:16:58Z | [20971] Firing 1 - The sidekiq service (main stage) has a apdex score (latency) below SLO |
2020-05-21T17:08:58Z | [20981] Firing 1 - The sidekiq service (main stage) has an error-ratio exceeding SLO |
2020-05-23T08:10:42Z | [21012] Firing 1 - The sidekiq service (main stage) has a apdex score (latency) below SLO |
2020-05-24T00:30:31Z | [21028] Firing 1 - SSL certificate for https://dashboards.gitlab.net expires in 23h 29m 58s |
2020-05-25T08:14:45Z | [21041] Firing 1 - The sidekiq service (main stage) has a apdex score (latency) below SLO |
2020-05-25T14:17:27Z | [21047] Firing 1 - customers.gitlab.com is down for 2 minutes |
2020-05-25T14:17:28Z | [21048] Firing 1 - customers.gitlab.com is not responding correctly for 2 minutes |
2020-05-26T00:30:33Z | [21073] Firing 1 - SSL certificate for https://customers.gitlab.com expires in 23h 29m 58s |
7 Day Issue Stats
- Oncall issues : 3
- Access Request : 1
- Change Issues : 0
- Incident Issues : 8
- CorrectiveAction Issues : 0
Change Issues
Incident Issues
- 2020-05-23T21:35:35Z - Increase in error rate in canary web - unassigned | ~S4 | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2184
- 2020-05-21T14:43:27Z - 2020-05-21: Version application deployment pipelines failing - devin | ~S4 | ServiceVersion |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2178
- 2020-05-21T13:15:35Z - Problems with outgoing mail - unassigned | ~S3 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2176
- 2020-05-21T05:53:44Z - 503 errors on about.gitlab.com - devin | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2175
- 2020-05-20T13:01:54Z - 2020-05-20: Version application database migration containers stale - devin | ~S4 | ServiceVersion |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2171
- 2020-05-20T04:10:59Z - Error 500 loading due to connection saturation - devin | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2167
- 2020-05-19T17:57:52Z - CloudFlare WAF causing issues with git operations - unassigned | ~S2 | ServiceCloudflare |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2164
- 2020-05-19T08:22:54Z - 2020-05-19: Cleanup after sslmate-cloudflare integration briefly enabled - craigf | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2157
CorrectiveAction Issues
- 2020-05-25T19:07:43Z - Investigate/add alerting for sharp increase in WAF blocks - unassigned
- 2020-05-22T23:03:34Z - Move Cloudflare Page Rules from
gprd
back togstg
- unassigned - 2020-05-20T23:43:49Z - Update Cloudflare alerts - unassigned
- 2020-05-20T16:04:54Z - Remove overly broad page rule from Cloudflare configuration - unassigned
- 2020-05-20T15:55:50Z - Create (or modify) a runbook to describe how to identify authenticated vs unauthenticated API calls - unassigned
- 2020-05-20T15:47:22Z - Create (or update) Cloudflare runbook to better address abuse and attack events - unassigned
- 2020-05-20T04:31:47Z - Evaluate available pgbouncer connections to avoid reaching saturation - unassigned
- 2020-05-19T12:07:37Z - clean up static-objects-cache tf module - mwasilewski-gitlab
- 2020-05-19T09:08:14Z - Alert when jobs are not being processed by sidekiq - unassigned
Open Issue Stats
- Oncall issues : 4
- Change issues : 1
- Incident issues : 0
- Access Request : 4
- CorrectiveAction : 97
Open Change Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2020-03-26T19:16:25Z | nnelson | Rotate credentials for user gitlab-superuser
|
Open Incident Issues
Show/Hide Table
Created | Assignee | Summary |
---|
Open Oncall Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2020-05-25T05:05:45Z | unassigned | Archived repository missing |
2020-05-11T06:18:27Z | unassigned | Manually remove project |
2020-03-30T13:38:11Z | brentnewton | jobs.gitlab.com cert expired unnoticed on 2020-03-28 |
2019-10-23T13:05:14Z | cmcfarland | cleanup registered nodes in chef |
This issue was automatically generated using oncall-robot-assistant
Edited by AnthonySandoval