OnCall report for period: 2018-03-27 - 2018-04-03
Oncall during this period
Schedule | Username |
---|---|
AMA | John Northrup |
AMA | Alejandro Rodriguez |
AMA | Alex Hanselka |
EU | Jason Tevnan |
EU | John Jarvis |
PagerDuty Incidents
- Number of incidents: 25
Created | Summary |
---|---|
2018-03-27T15:26:36Z | [1428] Firing 1 - Git NFS Server nfs-file-02.stor.gitlab.com:9100 is down |
2018-03-28T07:30:40Z | [1429] Pingdom check Dev.gitlab.org issue is down |
2018-03-28T11:43:05Z | [1430] Pingdom check GitLab.com Pages is down |
2018-03-28T11:50:43Z | [1431] Pingdom check Dev.gitlab.org issue is down |
2018-03-28T14:35:42Z | [1432] Pingdom check Dev.gitlab.org issue is down |
2018-03-29T09:03:15Z | [1434] Pingdom check GitLab.com master branch is down |
2018-03-29T09:36:58Z | [1435] Pingdom check GitLab.com master branch is down |
2018-03-29T10:53:05Z | [1436] Message from kamil in Slack room production |
2018-03-29T12:57:55Z | [1437] Pingdom check GitLab.com master branch is down |
2018-03-29T13:39:42Z | [1438] Firing 1 - prometheus is backlogging on the notifications queue |
2018-03-29T13:57:54Z | [1439] Pingdom check GitLab.com master branch is down |
2018-03-29T15:39:05Z | [1440] Pingdom check GitLab.com Pages is down |
2018-03-29T18:46:58Z | [1441] Pingdom check GitLab.com Pages is down |
2018-03-29T21:39:14Z | [1442] Pingdom check GitLab.com master branch is down |
2018-03-29T21:39:18Z | [1443] Pingdom check GitLab.com new repo is down |
2018-03-29T21:39:35Z | [1444] Pingdom check GitLab.com issue is down |
2018-03-29T21:39:43Z | [1445] Pingdom check GitLab.com public check is down |
2018-03-29T21:40:55Z | [1446] Firing 1 - High Error Rate on Front End Web |
2018-03-29T21:41:16Z | [1447] Firing 2 - Postgres seems to be processing very few transactions |
2018-03-29T21:45:44Z | [1448] Pingdom check GitLab.com issue is down |
2018-03-30T00:58:12Z | [1449] Firing 2 - prometheus is backlogging on the notifications queue |
2018-03-30T02:30:53Z | [1450] Pingdom check GitLab.com master branch is down |
2018-03-30T06:06:23Z | [1451] Firing 1 - prometheus is backlogging on the notifications queue |
2018-03-30T07:45:41Z | [1452] Pingdom check Dev.gitlab.org issue is down |
2018-03-30T08:23:32Z | [1453] Pingdom check GitLab.com public check is down |
Issues
7 Day OnCall Issue Stats
- Oncall issues : 17
- Access Request : 1
- Critical : 1
- Outage : 0
- Corrective Action : 0
Open OnCall Issue Stats
- Oncall issues : 20
- Access Request : 3
- Critical : 1
- Outage : 0
- Corrective Action : 18
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
03 Apr 18 10:24 UTC | jtevnan | page about 1% disk space |
02 Apr 18 18:26 UTC | unassigned | Upgrade internal Sentry |
30 Mar 18 23:34 UTC | unassigned | AWS Relocation of host used for Gitter.IM |
30 Mar 18 08:00 UTC | jtevnan | web server mass computicide |
30 Mar 18 06:35 UTC | unassigned | Large GitLab Pages repositories clogging the Sidekiq pipeline |
30 Mar 18 05:02 UTC | unassigned | Outage of GitLab.com due to Database Host Restart |
29 Mar 18 17:03 UTC | unassigned | CPU high for GitLab Pages Sidekiq workers |
29 Mar 18 09:06 UTC | jtevnan | page - master branch is down |
28 Mar 18 15:14 UTC | unassigned | Admin access for Security Team |
27 Mar 18 11:56 UTC | ilyaf | Structured logging for gitlab-shell (coming in GitLab 10.7) |
26 Mar 18 09:30 UTC | unassigned | Chef run errors on production environment related to systemd timeout |
24 Mar 18 21:20 UTC | unassigned | fluentd parse error on nfs-09 for gitaly logs |
21 Mar 18 16:11 UTC | unassigned | Configure the Container registry for canary |
20 Mar 18 12:12 UTC | unassigned | gitlab-sidekiq alerting generating regular alerts |
16 Mar 18 15:22 UTC | unassigned | Turn on repository verification checksum feature |
16 Mar 18 11:24 UTC | unassigned | nfs-file-07 load spiked up to ~150 |
15 Mar 18 23:47 UTC | nolith | March 15th dev.gitlab.org outage |
15 Mar 18 12:39 UTC | unassigned | Transfer of Gemnasium domains |
12 Feb 18 08:51 UTC | bjk-gitlab | Re-enable NFS metrics collection in node_exporter |
05 Feb 18 09:18 UTC | unassigned | VPN access for Valery |
Weekly Ops
p95 API latency for 200s
p50 Web latency for 200s
p50 API latency for 200s
Gitaly p95 latency
p50 Git latency for 200s
p95 Web latency for 200s
API CPU
p95 Git latency for 200s
Sidekiq CPU
Git CPU
NFS timeouts
Web CPU
This issue was automatically generated using oncall-robot-assistant