Skip to content

OnCall report for period: 2020-04-28 - 2020-05-05

Oncall during this period

Schedule Username
SRE 8 Hour Amar Amarsanaa
SRE 8 Hour Craig Miskell
SRE 8 Hour Craig Furman
SRE 8 Hour Nels Nelson

PagerDuty Incidents

* Number of incidents: **44** Show/Hide Table
Created Summary
2020-04-28T07:23:38Z [20230] Firing 2 - IncreasedServerResponseErrors
2020-04-28T13:32:36Z [20236] Firing 1 - Large number of overdue pull mirror jobs
2020-04-28T14:52:22Z [20237] Firing 2 - PrometheusManyRestarts
2020-04-28T15:36:34Z [20238] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-04-28T15:36:34Z [20239] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-04-28T18:01:05Z [20241] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-04-28T18:01:05Z [20240] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-04-28T18:07:02Z [20242] Active Distributed Cred Stuffing Attack
2020-04-28T18:16:19Z [20243] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-04-29T06:34:58Z [20256] Firing 1 - The patroni service (main stage) has a apdex score (latency) below SLO
2020-04-29T13:10:51Z [20264] Firing 1 - Large number of overdue pull mirror jobs
2020-04-29T14:11:22Z [20267] Firing 1 - Large number of overdue pull mirror jobs
2020-04-29T14:52:36Z [20268] Firing 1 - Large number of overdue pull mirror jobs
2020-04-29T15:45:30Z [20271] Firing 1 - Chef client failures have reached critical levels
2020-04-29T16:20:19Z [20274] HTTP500's during CI job artifact uploads
2020-04-29T16:20:20Z [20275] HTTP500's during CI job artifact uploads
2020-04-29T16:56:51Z [20276] Firing 1 - Large number of overdue pull mirror jobs
2020-04-29T22:40:52Z [20279] Security Incident - rotating DNS mgmt in AWS
2020-04-30T00:04:22Z [20281] Firing 2 - PrometheusManyRestarts
2020-04-30T10:15:37Z [20297] This is a manual paging test.
2020-04-30T12:28:51Z [20300] Firing 1 - Large number of overdue pull mirror jobs
2020-04-30T12:50:50Z [20302] Firing 1 - Large number of overdue pull mirror jobs
2020-04-30T15:48:25Z [20310] Pingdom check check:https://gitlab-examples.gitlab.io/ is down
2020-04-30T18:40:58Z [20314] Firing 1 - The sidekiq service (main stage) has a apdex score (latency) below SLO
2020-04-30T21:44:28Z [20315] Need to clear out password expiries for ~4k users
2020-05-01T00:06:58Z [20317] Firing 1 - The patroni service (main stage) has a apdex score (latency) below SLO
2020-05-01T13:09:58Z [20330] Firing 1 - Last WALE backup was seen 20m 4s ago.
2020-05-01T16:18:58Z [20332] Firing 1 - The patroni service (main stage) has a apdex score (latency) below SLO
2020-05-01T17:01:23Z [20333] Firing 1 - 5% disk space left
2020-05-01T17:32:26Z [20335] Firing 1 - 5% disk space left
2020-05-01T20:56:50Z [20338] Firing 1 - Gitaly latency on file-praefect-02-stor-gprd.c.gitlab-production.internal has been over 1m during the last 5m
2020-05-01T21:30:51Z [20339] Firing 1 - Gitaly latency on file-praefect-02-stor-gprd.c.gitlab-production.internal has been over 1m during the last 5m
2020-05-01T22:51:50Z [20340] Firing 1 - Gitaly latency on file-praefect-02-stor-gprd.c.gitlab-production.internal has been over 1m during the last 5m
2020-05-03T00:30:31Z [20361] Firing 1 - SSL certificate for https://githost.io expires in 23h 29m 58s
2020-05-04T03:51:58Z [20384] Firing 1 - The patroni service (main stage) has a apdex score (latency) below SLO
2020-05-04T16:41:34Z [20401] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-05-04T16:41:34Z [20402] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-05-04T17:56:04Z [20403] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-05-04T17:56:05Z [20404] Firing 1 - staging.GitLab.com is down for 30 minutes
2020-05-04T19:03:58Z [20406] Firing 1 - HPA unable to scale up
2020-05-04T19:22:45Z [20407] Accidentally deleted user
2020-05-04T21:26:29Z [20410] Firing 1 - HPA unable to scale up
2020-05-04T23:43:38Z [20413] Firing 1 - 5% disk space left
2020-05-04T23:52:29Z [20414] Firing 1 - HPA unable to scale up

7 Day Issue Stats

  • Oncall issues : 4
  • Access Request : 0
  • Change Issues : 1
  • Incident Issues : 42
  • CorrectiveAction Issues : 0

Change Issues

Incident Issues

CorrectiveAction Issues

Open Issue Stats

Open Change Issues

Show/Hide Table
Created Assignee Summary
2020-04-14T20:25:05Z cindy Migrate large projects off file-25-stor-gprd to file-01-stor-gprd

Open Incident Issues

Show/Hide Table
Created Assignee Summary
2020-05-04T10:31:56Z ahmadsherif 2020-05-04: possible data loss via external diffs migration

Open Oncall Issues

Show/Hide Table
Created Assignee Summary
2020-05-01T16:01:36Z nnelson Import request (for alex-solutions/core): alex-app
2020-04-23T12:53:53Z unassigned Set GITLAB_QA_FORMLESS_LOGIN_TOKEN variable on /etc/gitlab/gitlab.rb on live environments
2020-03-30T13:38:11Z brentnewton jobs.gitlab.com cert expired unnoticed on 2020-03-28
2020-03-23T23:43:57Z ggillies Manually remove project
2019-10-23T13:05:14Z cmcfarland cleanup registered nodes in chef

This issue was automatically generated using oncall-robot-assistant