OnCall report for period: 2018-04-24 - 2018-05-01
Oncall during this period
Schedule | Username |
---|---|
AMA | Alex Hanselka |
AMA | Ilya Frolov |
EU | Ahmad Sherif |
EU |
PagerDuty Incidents
- Number of incidents: 25
Created | Summary |
---|---|
2018-04-25T08:56:45Z | [1667] Firing 1 - Postgres transactions showing high rate of statement timeouts |
2018-04-25T10:12:59Z | [1669] Pingdom check GitLab.com Pages is down |
2018-04-25T11:57:17Z | [1670] Pingdom check GitLab.com new repo is down |
2018-04-25T11:57:43Z | [1671] Pingdom check GitLab.com public check is down |
2018-04-25T11:58:05Z | [1672] Pingdom check GitLab Infrastructure Master Branch is down |
2018-04-25T11:58:30Z | [1673] Pingdom check GitLab.com issue is down |
2018-04-25T12:01:03Z | [1674] Pingdom check GitLab.com issue is down |
2018-04-25T12:30:02Z | [1675] Pingdom check GitLab Infrastructure Master Branch is down |
2018-04-25T12:30:06Z | [1676] Pingdom check GitLab.com issue is down |
2018-04-25T12:30:16Z | [1677] Pingdom check GitLab.com public check is down |
2018-04-25T12:30:19Z | [1678] Pingdom check GitLab.com new repo is down |
2018-04-25T12:31:02Z | [1679] Firing 1 - High Error Rate on Front End Web |
2018-04-25T12:37:49Z | [1680] Firing 1 - High 4xx Error Rate on Front End Web |
2018-04-25T12:38:50Z | [1681] Firing 1 - High 4xx Error Rate on Front End Web on backend api_rate_limit |
2018-04-25T15:39:43Z | [1682] Pingdom check GitLab.com public check is down |
2018-04-25T15:40:03Z | [1683] Pingdom check GitLab Infrastructure Master Branch is down |
2018-04-25T15:40:19Z | [1684] Pingdom check GitLab.com new repo is down |
2018-04-25T15:52:29Z | [1685] Firing 1 - Replicas have different upstream primary databases |
2018-04-25T15:53:46Z | [1686] Firing 1 - Postgres seems to be processing very few transactions |
2018-04-25T15:56:30Z | [1687] Firing 1 - Postgres Replication lag is over 2 minutes |
2018-04-25T15:56:30Z | [1688] Firing 1 - Postgres Replication lag (in bytes) is high |
2018-04-25T16:44:15Z | [1689] Firing 1 - Last WALE backup (from postgres-01 to S3) was seen 1h 5m 35s ago. |
2018-04-25T23:53:45Z | [1690] Firing 1 - Postgres seems to be processing very few transactions |
2018-04-26T01:24:03Z | [1691] Pingdom check GitLab Infrastructure Master Branch is down |
2018-04-26T01:24:18Z | [1692] Pingdom check GitLab.com new repo is down |
Issues
7 Day OnCall Issue Stats
- Oncall issues : 20
- Access Request : 5
- Critical : 0
- Outage : 0
- Corrective Action : 2
Open OnCall Issue Stats
- Oncall issues : 19
- Access Request : 4
- Critical : 1
- Outage : 0
- Corrective Action : 15
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
01 May 18 04:47 UTC | unassigned | GitHost - backups failing on ~90 instances (possibly more) |
01 May 18 03:58 UTC | unassigned | Install new *.githost.io SSL on GitHost instances |
01 May 18 00:18 UTC | unassigned | log.gitlap.com stopped getting data |
01 May 18 00:09 UTC | unassigned | Access to VPN please |
30 Apr 18 18:01 UTC | unassigned | node_exporter lock up on nfs-file-06 |
30 Apr 18 16:50 UTC | unassigned | Increased replication lag leading to authentication errors in pipelines |
30 Apr 18 15:11 UTC | unassigned | GCP staging access for Valery |
30 Apr 18 12:22 UTC | unassigned | Prometheus stopped collecting metrics for an hour |
27 Apr 18 19:21 UTC | unassigned | Valery: Access to Azure |
26 Apr 18 07:55 UTC | unassigned | Enable Object Storage on Dev |
26 Apr 18 03:19 UTC | unassigned | Severe site degradation due to database load |
24 Apr 18 12:01 UTC | jtevnan | multiple pings about prometheus |
20 Apr 18 20:26 UTC | unassigned | staging disks are all full |
20 Apr 18 09:16 UTC | unassigned | nfs-15 reporting ResourceExhausted Gitaly/gRPC errors |
19 Apr 18 16:54 UTC | unassigned | Gradual rollout of git 2.16? |
16 Apr 18 22:55 UTC | unassigned | GitLab CI jobs fail randomly in git heavy workflow |
16 Apr 18 17:24 UTC | unassigned | Users unable to access vetsens/pi-install |
09 Apr 18 14:04 UTC | ahmadsherif | knife ssh hangs on commands that spawn pagers |
06 Apr 18 01:46 UTC | unassigned | On-board packagecloud.io to the Production team |
Weekly Ops
Web/Git/API p95 latency
Gitaly p95 latency
API CPU
Sidekiq CPU
NFS timeouts
Git CPU
Web CPU
p95 API latency for 200s
p95 Git latency for 200s
p50 Web latency for 200s
p50 API latency for 200s
p50 Git latency for 200s
p95 Web latency for 200s
This issue was automatically generated using oncall-robot-assistant