OnCall report for period: 2018-03-06 - 2018-03-13
Oncall during this period
Schedule | Username |
---|---|
AMA | John Northrup |
AMA | Jason Tevnan |
EU | Ahmad Sherif |
EU | Jason Tevnan |
EU | John Jarvis |
PagerDuty Incidents
- Number of incidents: 25
Created | Summary |
---|---|
2018-03-08T19:39:35Z | [1328] Message from vhernandez in Slack room directmessage |
2018-03-08T20:10:34Z | [1329] this DM is to slackbot |
2018-03-08T22:52:01Z | [1330] Pingdom check GitLab.com Pages is down |
2018-03-08T22:55:57Z | [1331] Pingdom check GitLab.com Pages is down |
2018-03-08T23:26:05Z | [1332] Pingdom check GitLab.com Pages is down |
2018-03-08T23:35:58Z | [1333] Pingdom check GitLab.com Pages is down |
2018-03-08T23:41:57Z | [1334] Pingdom check GitLab.com Pages is down |
2018-03-08T23:47:20Z | [1335] Pingdom check GitLab.com Pages is down |
2018-03-08T23:52:01Z | [1336] Pingdom check GitLab.com Pages is down |
2018-03-08T23:55:56Z | [1337] Pingdom check GitLab.com Pages is down |
2018-03-09T00:44:39Z | [1338] Firing 1 - CPU use percent is extremely high on nfs-file-13.stor.gitlab.com for the past 2 hours. |
2018-03-09T05:00:55Z | [1339] SSH is down (at least on git-12) |
2018-03-09T12:16:37Z | [1340] Pingdom check GitLab.com issue is down |
2018-03-09T12:17:25Z | [1341] Pingdom check GitLab.com new repo is down |
2018-03-09T12:19:05Z | [1342] Pingdom check GitLab.com issue is down |
2018-03-09T12:19:44Z | [1343] Pingdom check GitLab.com public check is down |
2018-03-09T12:20:53Z | [1344] Pingdom check GitLab.com master branch is down |
2018-03-09T12:21:33Z | [1345] Firing 1 - High Error Rate on Front End Web |
2018-03-09T12:26:01Z | [1346] Pingdom check GitLab.com Pages is down |
2018-03-09T22:00:35Z | [1347] Pingdom check GitLab.com master branch is down |
2018-03-10T07:29:51Z | [1348] Firing 1 - CPU use percent is extremely high on nfs-file-11.stor.gitlab.com for the past 2 hours. |
2018-03-10T10:21:52Z | [1349] Firing 1 - CPU use percent is extremely high on nfs-file-05.stor.gitlab.com for the past 2 hours. |
2018-03-10T10:56:53Z | [1350] Firing 2 - |
2018-03-10T11:06:52Z | [1351] Firing 1 - |
2018-03-10T18:50:52Z | [1352] Firing 1 - CPU use percent is extremely high on nfs-file-11.stor.gitlab.com for the past 2 hours. |
Issues
7 Day OnCall Issue Stats
- Oncall issues : 18
- Access Request : 5
- Critical : 0
- Outage : 0
- Corrective Action : 1
Open OnCall Issue Stats
- Oncall issues : 31
- Access Request : 5
- Critical : 0
- Outage : 0
- Corrective Action : 15
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
13 Mar 18 10:15 UTC | unassigned | ossec-analysisd is using excessive CPU on nfs-file-05 and nfs-file-11 |
13 Mar 18 08:44 UTC | unassigned | Decrease redis maxmemory in a controlled way |
12 Mar 18 22:54 UTC | unassigned | Redis outage: 2018-03-15 22:40 - 22:43 UTC |
12 Mar 18 17:25 UTC | unassigned | Gitaly CPU Consumption since 8 March 2018 |
12 Mar 18 12:22 UTC | jtevnan | commenting is not possible on gitlab.com |
12 Mar 18 11:22 UTC | northrup | The new pricing changes redirects, "purged" the /gitlab-com/settings page |
10 Mar 18 07:52 UTC | jtevnan | nfs-file-11 high load page |
09 Mar 18 23:36 UTC | unassigned | Use /etc/gitlab/skip-auto-reconfigure and remove /etc/gitlab/skip-auto-migrations |
08 Mar 18 16:49 UTC | unassigned | Implement log rotation for license app |
08 Mar 18 08:22 UTC | unassigned | Request: staging rails console access |
07 Mar 18 12:44 UTC | unassigned | Chef and SSH access request for Filipa Lacerda |
06 Mar 18 17:51 UTC | unassigned | Chef and SSH access request for Mayra Cabrera |
06 Mar 18 17:14 UTC | unassigned | Reenable sticky sessions for prod |
06 Mar 18 16:35 UTC | unassigned | Production and GPRD access for Valery |
05 Mar 18 19:41 UTC | unassigned | Enable structured logging for Workhose |
28 Feb 18 15:27 UTC | unassigned | Push version of gitlab-exporters for change to queries.yaml to production |
28 Feb 18 14:11 UTC | unassigned | Shared runners not connecting to docker daemon |
26 Feb 18 17:21 UTC | northrup | Configure review apps and domain for design.GitLab.com repository |
20 Feb 18 21:57 UTC | unassigned | Delays in nfs-08, possibly due to user hammering a repository |
19 Feb 18 10:08 UTC | unassigned | Create Gitter VPN accounts for all production engineers |
19 Feb 18 10:06 UTC | unassigned | Move the Gitter alerts to the GitLab Pagerduty account |
19 Feb 18 10:02 UTC | unassigned | Use personal ssh accounts instead of the deployer one |
19 Feb 18 09:52 UTC | unassigned | [META] Gitter infrastructure handover |
12 Feb 18 08:51 UTC | bjk-gitlab | Re-enable NFS metrics collection in node_exporter |
09 Feb 18 08:14 UTC | unassigned | Missing alert for inodes on Gitter nodes |
08 Feb 18 18:44 UTC | unassigned | Manage redis cache config via omnibus |
21 Jan 18 17:57 UTC | unassigned | XLOG generation peak |
18 Jan 18 11:49 UTC | unassigned | Failed ssh connection monitoring |
18 Jan 18 11:30 UTC | unassigned | Add alert for failure to start unicorn |
12 Jan 18 17:38 UTC | unassigned | Use Pages healthcheck |
21 Dec 17 13:59 UTC | _stark | Alert on errors in the pgbouncer log |
Weekly Ops
p95 API latency for 200s
p95 Git latency for 200s
p50 Web latency for 200s
p95 Web latency for 200s
p50 API latency for 200s
p50 Git latency for 200s
Gitaly p95 latency
Sidekiq CPU
API CPU
Git CPU
Web CPU
NFS timeouts
This issue was automatically generated using oncall-robot-assistant