OnCall report for period: 2020-02-18 - 2020-02-25
Oncall during this period
Schedule | Username |
---|---|
SRE 8 Hour | Devin Sylva |
SRE 8 Hour | Craig Barrett |
SRE 8 Hour | Hendrik Meyer |
SRE 8 Hour | Henri Philipps |
SRE 8 Hour | Cameron McFarland |
SRE 8 Hour | Michal Wasilewski |
SRE 8 Hour | Craig Miskell |
PagerDuty Incidents
* Number of incidents: **55**
Show/Hide Table
Created | Summary |
---|---|
2020-02-18T20:13:55Z | [17585] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/issues is down |
2020-02-18T20:27:03Z | [17586] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/merge_requests/ is down |
2020-02-19T08:10:23Z | [17613] Firing 1 - Postgres is generating XLOG too fast, expect this to cause replication lag |
2020-02-19T21:50:14Z | [17640] Firing 1 - Gitaly error rate is too high: 8.02 |
2020-02-21T14:19:58Z | [17703] Firing 1 - Postgres is generating XLOG too fast, expect this to cause replication lag |
2020-02-22T03:43:07Z | [17742] Firing 1 - Increased Error Rate Across Fleet |
2020-02-22T03:43:12Z | [17743] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/issues is down |
2020-02-22T03:45:21Z | [17745] Pingdom check check:https://gitlab.com/gitlab-com/gitlab-com-infrastructure/tree/master is down |
2020-02-22T03:45:35Z | [17746] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/tree/master is down |
2020-02-22T03:46:44Z | [17747] Pingdom check check:gitlab-org/gitlab-foss#1 (closed) is down |
2020-02-22T03:46:59Z | [17748] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/merge_requests/ is down |
2020-02-22T03:47:07Z | [17749] Firing 1 - High Error Rate on Front End Web |
2020-02-22T03:47:36Z | [17750] Pingdom check check:https://gitlab.com/projects/new is down |
2020-02-22T03:49:43Z | [17752] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/ is down |
2020-02-22T03:51:51Z | [17753] Firing 1 - Web latency on GitLab.com has been over 2s during the last 5m |
2020-02-22T03:52:15Z | [17755] Firing 1 - GitLab.com is down for 2 minutes |
2020-02-22T03:52:15Z | [17756] Firing 1 - GitLab.com is down for 2 minutes |
2020-02-22T03:53:05Z | [17758] Pingdom check check:https://gitlab.com/ is down |
2020-02-22T04:02:36Z | [17760] Firing 2 - IncreasedBackendConnectionErrors |
2020-02-22T04:03:59Z | [17761] Firing 1 - GitLab.com is down for 2 minutes |
2020-02-22T04:04:00Z | [17762] Firing 1 - GitLab.com is down for 2 minutes |
2020-02-22T04:08:40Z | [17763] Pingdom check check:https://gitlab.com/ is down |
2020-02-22T04:24:52Z | [17765] Firing 1 - Increased HAProxy Backend Connection Errors |
2020-02-22T04:34:54Z | [17766] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/ is down |
2020-02-22T06:28:51Z | [17767] Firing 1 - Increased Error Rate Across Fleet |
2020-02-22T06:52:23Z | [17768] Firing 1 - Increased Error Rate Across Fleet |
2020-02-22T06:54:05Z | [17770] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/ is down |
2020-02-22T06:55:45Z | [17771] Pingdom check check:gitlab-org/gitlab-foss#1 (closed) is down |
2020-02-22T06:56:50Z | [17772] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/merge_requests/ is down |
2020-02-22T06:57:23Z | [17773] Firing 1 - High Error Rate on Front End Web |
2020-02-22T06:58:02Z | [17774] Pingdom check check:https://gitlab.com/projects/new is down |
2020-02-22T06:58:53Z | [17775] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/issues is down |
2020-02-22T07:00:15Z | [17776] Pingdom check check:https://gitlab.com/gitlab-com/gitlab-com-infrastructure/tree/master is down |
2020-02-22T07:03:06Z | [17777] Firing 1 - Increased Error Rate Across Fleet |
2020-02-22T07:13:22Z | [17779] Firing 1 - Gitaly latency on file-marquee-03-stor-gprd.c.gitlab-production.internal has been over 1m during the last 5m |
2020-02-22T07:16:04Z | [17780] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/tree/master is down |
2020-02-22T07:19:07Z | [17781] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-foss/issues is down |
2020-02-22T07:21:13Z | [17782] Pingdom check check:https://gitlab.com/gitlab-com/gitlab-com-infrastructure/tree/master is down |
2020-02-22T07:28:26Z | [17783] Firing 1 - Gitaly latency on file-praefect-01-stor-gprd.c.gitlab-production.internal has been over 1m during the last 5m |
2020-02-22T08:49:32Z | [17786] Firing 1 - Chef client failures have reached critical levels |
2020-02-22T13:17:51Z | [17793] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T13:27:23Z | [17794] Firing 1 - High 4xx Error Rate on Front End Web on backend api_rate_limit |
2020-02-22T14:16:37Z | [17795] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T14:58:25Z | [17796] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T15:46:52Z | [17798] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T16:28:36Z | [17800] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T17:21:30Z | [17802] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T17:59:37Z | [17805] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T18:44:37Z | [17807] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T19:06:21Z | [17808] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T19:26:21Z | [17809] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-22T20:21:37Z | [17810] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-23T08:04:23Z | [17824] Pingdom check check:https://deps.sec.gitlab.com/api/ping is down |
2020-02-23T20:23:51Z | [17842] Firing 1 - High 4xx Error Rate on Front End Web |
2020-02-24T13:51:39Z | [17855] Firing 1 - 5% disk space left |
7 Day Issue Stats
- Oncall issues : 1
- Access Request : 0
- Change Issues : 1
- Incident Issues : 7
- CorrectiveAction Issues : 0
Change Issues
- 2020-02-19T09:24:24Z - Set PUMA_INJECT_WAIT_TICKS=0 on 2 web and 2 api nodes - jarv
Incident Issues
- 2020-02-24T10:27:23Z - 2020-02-24: spikes in latencies of a number of services - cmcfarland | ~S3 | ServiceRedis |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1697
- 2020-02-23T08:16:08Z - 2020-02-23 Gemnasium ( deps.sec.gitlab.com ) briefly down - mwasilewski-gitlab | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1695
- 2020-02-22T09:20:33Z - RCA: 2020-02-22 Gitlab.com down under DoS attack - craig | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1694
- 2020-02-22T05:23:09Z - GitLab.com Down Under DoS Attack - ansdval | ~S1 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1693
- 2020-02-19T21:52:05Z - 2020-02-19: Gitaly error rate is too high: 8.02 - cmcfarland | ~S3 | ServiceGitaly |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1687
- 2020-02-19T06:25:33Z - 2020-02-19 Bad canary? The
cny
stage of theweb
service has an error-ratio exceeding SLO, but the main stage does not. - cmiskell | ~S4 | ServiceWeb |https://gitlab.com/gitlab-com/gl-infra/production/issues/1682
- 2020-02-18T20:29:48Z - 2020-02-18: Canary web saturated during deploy - cmcfarland | | ServiceWeb |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1679
CorrectiveAction Issues
- 2020-02-24T22:50:47Z - Migrate Production Project to
ops.gitlab.net
- ansdval - 2020-02-24T21:33:31Z - Simplify and standardize path-based haproxy blocking - msmiley
- 2020-02-20T16:52:37Z - Identify all places in TF modules where DNS entries are created (outside of the dns environment in the tf repo) and come up with a unified way for managing them - unassigned
Open Issue Stats
- Oncall issues : 5
- Change issues : 2
- Incident issues : 0
- Access Request : 5
- CorrectiveAction : 69
Open Change Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2019-12-03T05:56:02Z | unassigned | Upgrade Sentry instance to 9.1.2 |
2019-10-16T14:37:43Z | nnelson | Migrate large projects off file-33-stor-gprd to file-43-stor-gprd |
Open Incident Issues
Show/Hide Table
Created | Assignee | Summary |
---|
Open Oncall Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2020-02-20T20:31:30Z | cmcfarland | gitlab.com Production Admin Accounts Listing |
2020-02-12T16:00:37Z | unassigned | dev.gitlab.org - Admins Export |
2020-01-16T06:07:03Z | aamarsanaa | Incremental rollout for the Pages new API based config source |
2020-01-15T20:57:26Z | devin | Tracking state of mod security on version.gitlab.com for WAF Troubleshooting |
2019-10-23T13:05:14Z | cmcfarland | cleanup registered nodes in chef |
This issue was automatically generated using oncall-robot-assistant