OnCall report for period: 2018-02-06 - 2018-02-13
Oncall during this period
Schedule | Username |
---|---|
AMA | John Northrup |
EU | Daniele Valeriani |
EU | Victor Lopez |
EU | John Jarvis |
PagerDuty Incidents
- Number of incidents: 10
Created | Summary |
---|---|
2018-02-08T10:31:07Z | [1271] Firing 1 - 1% disk space left |
2018-02-08T14:34:48Z | [1272] Firing 1 - Gitaly latency on nfs-file-12.stor.gitlab.com has been over 1m during the last 5m |
2018-02-08T16:24:15Z | [1273] 500 gitlab |
2018-02-08T16:24:21Z | [1274] Gitlab is down |
2018-02-09T08:31:19Z | [1276] Pingdom check Version server is down |
2018-02-11T10:51:31Z | [1278] Firing 1 - Postgres is generating XLOG too fast, expect this to cause replication lag |
2018-02-12T02:18:06Z | [1279] Firing 1 - 1% disk space left |
2018-02-12T11:43:29Z | [1280] Firing 2 - Postgres Replication lag (in bytes) is high |
2018-02-12T13:43:21Z | [1281] Firing 1 - prometheus is unreachable |
2018-02-13T00:23:22Z | [1283] Firing 1 - prometheus is unreachable |
Issues
7 Day OnCall Issue Stats
- Oncall issues : 13
- Access Request : 2
- Critical : 0
- Outage : 0
- Corrective Action : 3
Open OnCall Issue Stats
- Oncall issues : 25
- Access Request : 2
- Critical : 0
- Outage : 0
- Corrective Action : 17
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
12 Feb 18 22:27 UTC | unassigned | 42 LFS Object Missing |
12 Feb 18 18:39 UTC | ahanselka | Migrate LFS objects from Object back to disk |
12 Feb 18 09:28 UTC | unassigned | offboarding Victor Lopez |
12 Feb 18 08:51 UTC | unassigned | Re-enable NFS metrics collection in node_exporter |
09 Feb 18 17:42 UTC | unassigned | Fix SSL certificate for registry.staging.gitlab.com
|
08 Feb 18 18:44 UTC | unassigned | Manage redis cache config via omnibus |
08 Feb 18 11:42 UTC | unassigned | nfs-file-04 outage 2018-02-08 06h00 UTC |
08 Feb 18 05:11 UTC | unassigned | Increased incidents of "Source branch does not exist" (branch cache stale) |
07 Feb 18 17:58 UTC | ahanselka | Temporary production GCP access for Nick |
05 Feb 18 22:55 UTC | northrup | Disable custom domains in GitLab Pages |
05 Feb 18 09:29 UTC | jarv | VPN access for Brett |
05 Feb 18 09:25 UTC | jarv | offboarding pablo |
05 Feb 18 06:03 UTC | unassigned | Out of disk space on git-07 |
31 Jan 18 19:45 UTC | victorcete | No logs on Staging |
26 Jan 18 13:47 UTC | northrup | Name resolution errors on nfs-file-XX machines |
21 Jan 18 17:57 UTC | unassigned | XLOG generation peak |
18 Jan 18 11:49 UTC | unassigned | Failed ssh connection monitoring |
18 Jan 18 11:30 UTC | unassigned | Add alert for failure to start unicorn |
12 Jan 18 17:38 UTC | unassigned | Use Pages healthcheck |
21 Dec 17 14:03 UTC | unassigned | Add alert for sequential reads |
21 Dec 17 13:59 UTC | _stark | Alert on errors in the pgbouncer log |
03 Nov 17 15:50 UTC | unassigned | Contain GC executions by limiting the resources allocation |
26 Sep 17 12:43 UTC | unassigned | Better optics or alarming for redis failovers |
25 Jul 17 10:09 UTC | unassigned | Add alert for when read/write drop to 0 for while in the master database |
09 Jun 17 08:31 UTC | unassigned | Create a database map graph in the production architecture page |
Weekly Ops
Web/Git/API p95 latency
Gitaly p95 latency
Sidekiq CPU
API CPU
Git CPU
Web CPU
NFS timeouts
This issue was automatically generated using oncall-robot-assistant