OnCall report for period: 2017-12-19 - 2017-12-26
Oncall during this period
| Schedule | Username |
|---|---|
| AMA | Ilya Frolov |
| AMA | Jason Tevnan |
| EU | Pablo Carranza |
| EU | John Northrup |
| EU | Ilya Frolov |
| EU | John Jarvis |
PagerDuty Incidents
- Number of incidents: 16
| Created | Summary |
|---|---|
| 2017-12-19T19:09:51Z | [#1167] Postgres Replication lag is over 200MB |
| 2017-12-20T09:47:11Z | [#1168 (closed)] Pingdom check GitLab.com master branch is down |
| 2017-12-20T10:21:01Z | [#1169 (closed)] PostgreSQL replication slot with an stale xmin which can cause bloat on the primary |
| 2017-12-20T10:56:21Z | [#1170 (closed)] PostgreSQL replication slot with an stale xmin which can cause bloat on the primary |
| 2017-12-20T11:17:08Z | [#1171] PostgreSQL replication slot with an stale xmin which can cause bloat on the primary |
| 2017-12-20T19:16:55Z | [#1172] No disk space left on / on prometheus-01.us-east1-d.gce.gitlab-runners.gitlab.net: 0% |
| 2017-12-20T19:17:25Z | [#1173] No disk space left on / on prometheus-01.us-east1-c.gce.gitlab-runners.gitlab.net: 0% |
| 2017-12-20T19:46:10Z | [#1174] No disk space left on / on prometheus-01.nyc1.do.gitlab-runners.gitlab.net: 0% |
| 2017-12-20T22:58:40Z | [#1175] No disk space left on / on prometheus-01.us-east1-d.gce.gitlab-runners.gitlab.net: 0% |
| 2017-12-20T22:59:10Z | [#1176 (closed)] No disk space left on / on prometheus-01.us-east1-c.gce.gitlab-runners.gitlab.net: 0% |
| 2017-12-22T10:35:31Z | [#1177 (closed)] Gitaly latency on nfs-file-11.stor.gitlab.com has been over 1m during the last 5m |
| 2017-12-22T12:06:36Z | [#1178 (closed)] PostgreSQL replication slot with an stale xmin which can cause bloat on the primary |
| 2017-12-22T13:41:55Z | [#1179 (closed)] No disk space left on /opt/gitlab on runners-cache-5.gitlab.com: 997.3m% |
| 2017-12-24T11:05:40Z | [#1180 (closed)] CPU use percent is extremely high on db3.cluster.gitlab.com for the past 2 hours. |
| 2017-12-25T07:31:28Z | [#1181 (closed)] Gitaly latency on nfs-file-04.stor.gitlab.com has been over 1m during the last 5m |
| 2017-12-25T07:54:03Z | [#1182 (closed)] Gitaly latency on nfs-file-04.stor.gitlab.com has been over 1m during the last 5m |
Issues
Stats for the last oncall period
- Total number of oncall issues opened in the last on call shift: 10
- Access Request: 0
- Critical: 1
- Total number of oncall issues closed in this milestone: 0
- Access Request: 0
- Critical: 0
Open OnCall Issues
- Total number of open oncall issues: 11
- Access Request: 0
- Critical: 1
| Created | Assignee | Summary |
|---|---|---|
| 24 Dec 17 12:33 UTC | unassigned | High database load on primary database |
| 21 Dec 17 16:51 UTC | unassigned | Database specialists should be on-call for database related problems |
| 21 Dec 17 14:03 UTC | unassigned | Add alert for sequential reads |
| 21 Dec 17 13:59 UTC | unassigned | Alert on errors in the pgbouncer log |
| 21 Dec 17 11:33 UTC | tmaczukin | Add cleaning mechanism for runners-cache-X machines |
| 21 Dec 17 09:25 UTC | unassigned | validate end-to-end artifact uploading on staging and enable on production |
| 20 Dec 17 12:06 UTC | unassigned | webhooks broken after ssl update |
| 12 Dec 17 18:54 UTC | ahanselka | Need account/access to OpenVAS security scanner |
| 23 Nov 17 10:37 UTC | unassigned | Cleanup SSL certificates |
| 15 Nov 17 10:34 UTC | unassigned | Detect and alarm on long-running orphan processes on sidekiq |
| 06 Nov 17 14:30 UTC | unassigned | Alarms should go off when we fail to create azure snapshots |
Weekly Ops
Web/Git/API p95 latency
Gitaly p95 latency
NFS timeouts
Sidekiq CPU
API CPU
Git CPU
Web CPU
This issue was automatically generated using oncall-robot-assistant