OnCall report for period: 2018-08-28 - 2018-09-04
Oncall during this period
| Schedule | Username |
|---|---|
| AMA | Alejandro Rodriguez |
| AMA | Alex Hanselka |
| AMA | John Skarbek |
| EU | John Northrup |
| EU | Ahmad Sherif |
| EU | Jarv Jarvis |
PagerDuty Incidents
- Number of incidents: 61
| Created | Summary |
|---|---|
| 2018-08-22T10:36:50Z | [2720] Firing 1 - Postgres Replication lag (in bytes) is high |
| 2018-08-22T10:38:13Z | [2721] Firing 1 - Postgres seems to be processing very few transactions |
| 2018-08-22T10:38:31Z | [2722] Firing 1 - Postgres Replication lag is over 2 minutes |
| 2018-08-22T11:48:09Z | [2723] Firing 1 - Postgres seems to be processing very few transactions |
| 2018-08-22T11:53:37Z | [2724] Firing 1 - Postgres Replication lag is over 2 minutes |
| 2018-08-22T11:53:37Z | [2725] Firing 1 - Postgres seems to be processing very few transactions |
| 2018-08-22T11:56:48Z | [2726] Firing 1 - Postgres Replication lag (in bytes) is high |
| 2018-08-23T14:36:32Z | [2727] Firing 2 - HighRailsErrorRate |
| 2018-08-23T17:43:27Z | [2728] Pingdom check GitLab Infrastructure Master Branch is down |
| 2018-08-24T12:49:21Z | [2730] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-24T17:54:30Z | [2731] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-24T23:09:24Z | [2732] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-25T02:16:03Z | [2733] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-26T20:13:04Z | [2734] Firing 2 - HighRailsErrorRate |
| 2018-08-26T20:59:46Z | [2735] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-27T01:23:31Z | [2736] Firing 2 - HighRailsErrorRate |
| 2018-08-27T05:07:22Z | [2737] Firing 2 - HighRailsErrorRate |
| 2018-08-27T07:34:21Z | [2738] Firing 2 - HighRailsErrorRate |
| 2018-08-27T07:41:10Z | [2739] Firing 2 - NoDiskSpace |
| 2018-08-27T09:38:47Z | [2740] Firing 4 - HighRailsErrorRate |
| 2018-08-27T10:37:20Z | [2741] Firing 4 - HighRailsErrorRate |
| 2018-08-27T10:37:26Z | [2742] Pingdom check GitLab.com new repo is down |
| 2018-08-27T10:37:31Z | [2743] Pingdom check GitLab Infrastructure Master Branch is down |
| 2018-08-27T10:37:37Z | [2744] Pingdom check GitLab.com issue is down |
| 2018-08-27T10:38:20Z | [2745] Firing 1 - High Error Rate on Front End Web |
| 2018-08-27T10:54:44Z | [2746] Firing 2 - PullMirrorsOverdueQueueTooLarge |
| 2018-08-27T13:34:03Z | [2747] Firing 2 - HighRailsErrorRate |
| 2018-08-27T14:21:31Z | [2748] Firing 3 - HighRailsErrorRate |
| 2018-08-27T14:30:53Z | [2749] Firing 5 - PrometheusUnreachable |
| 2018-08-27T14:41:24Z | [2750] Firing 1 - prometheus is backlogging on the notifications queue |
| 2018-08-27T14:51:28Z | [2751] Firing 2 - PrometheusNotificationsBacklog |
| 2018-08-27T16:48:02Z | [2752] Firing 2 - HighRailsErrorRate |
| 2018-08-27T17:08:54Z | [2753] Firing 1 - prometheus is backlogging on the notifications queue |
| 2018-08-27T19:28:16Z | [2754] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-27T20:01:02Z | [2755] Firing 2 - HighRailsErrorRate |
| 2018-08-27T20:29:15Z | [2756] Firing 2 - PullMirrorsOverdueQueueTooLarge |
| 2018-08-27T23:20:32Z | [2757] Firing 3 - HighRailsErrorRate |
| 2018-08-27T23:57:29Z | [2758] Firing 2 - PullMirrorsOverdueQueueTooLarge |
| 2018-08-28T01:04:22Z | [2759] Firing 2 - HighRailsErrorRate |
| 2018-08-28T01:04:32Z | [2760] Firing 2 - HighRailsErrorRate |
| 2018-08-28T04:14:05Z | [2761] Firing 2 - HighRailsErrorRate |
| 2018-08-28T04:41:03Z | [2762] Firing 2 - HighRailsErrorRate |
| 2018-08-28T09:23:30Z | [2763] Pingdom check GitLab.com issue is down |
| 2018-08-28T09:23:34Z | [2764] Pingdom check GitLab Infrastructure Master Branch is down |
| 2018-08-28T09:23:43Z | [2765] Pingdom check GitLab.com public check is down |
| 2018-08-28T09:23:49Z | [2766] Pingdom check GitLab.com new repo is down |
| 2018-08-28T09:24:22Z | [2767] Firing 1 - High Error Rate on Front End Web |
| 2018-08-28T09:24:26Z | [2768] Firing 1 - The alert test file is missing! |
| 2018-08-28T16:17:32Z | [2770] Firing 1 - High Rails Error Rate on Front End |
| 2018-08-29T13:05:24Z | [2774] Firing 1 - prometheus is backlogging on the notifications queue |
| 2018-08-30T16:56:00Z | [2776] Firing 2 - PostgreSQL_XLOGConsumptionTooLow |
| 2018-08-31T18:30:41Z | [2780] Firing 2 - FeLoadBalancerLossOfRedundancy |
| 2018-08-31T18:55:40Z | [2781] Firing 1 - Loss of Redundancy |
| 2018-09-01T08:57:17Z | [2782] Firing 4 - HighRailsErrorRate |
| 2018-09-03T13:14:32Z | [2783] Firing 4 - HighRailsErrorRate |
| 2018-09-03T13:14:50Z | [2784] Pingdom check GitLab.com new repo is down |
| 2018-09-03T16:05:53Z | [2785] Firing 6 - PrometheusUnreachable |
| 2018-09-04T07:22:32Z | [2786] Firing 4 - HighRailsErrorRate |
| 2018-09-04T07:37:43Z | [2787] Firing 1 - High Rails Error Rate on Front End |
| 2018-09-04T09:04:02Z | [2788] Firing 4 - HighRailsErrorRate |
| 2018-09-04T09:15:19Z | [2789] Firing 3 - HighRailsErrorRate |
Open Issue Stats
- Oncall issues : 19
- Change issues : 5
- Incident issues : 6
- Access Request : 9
Open Change Issues
| Created | Assignee | Summary |
|---|---|---|
| 2018-09-04T12:12:20Z | jarv | Remove file NFS mounts from gitlab.com |
| 2018-08-28T15:13:27Z | ahmadsherif | Switch to Sentry on GCP |
| 2018-08-24T05:23:48Z | unassigned | Postgres: Decrease log_min_duration_statement to 500ms |
| 2018-08-20T15:58:02Z | unassigned | Raise alert threshold for wale-replica |
| 2018-08-11T12:48:09Z | abrandl | Increase max_replication_slots
|
Open Incident Issues
| Created | Assignee | Summary |
|---|---|---|
| 2018-09-04T12:24:44Z | unassigned | Sometimes GCS returns 5XX (Server error) status code |
| 2018-08-28T10:18:38Z | unassigned | Increased error rate on GitLab https due to health check failing on the web fleet |
| 2018-08-20T14:01:24Z | unassigned | Diminishing of logging visibility over the weekend (18 & 19 Aug) |
| 2018-08-17T17:16:59Z | unassigned | Incident Working doc: 2018-08-17 |
| 2018-08-13T10:05:43Z | unassigned | 2018-08-13: Failing API Health Checks |
| 2018-08-10T21:34:43Z | unassigned | 2018-08-10: High rate of 500 errors on API nodes |
Open Oncall Issues
| Created | Assignee | Summary |
|---|---|---|
| 2018-09-04T13:51:07Z | unassigned | Add haproxy rules to deny abusive paths |
| 2018-08-25T17:58:06Z | unassigned | Import Failures - amp-robotics/migration-testing/axon and ledgerx/core-issues-archive/ |
| 2018-08-25T04:53:02Z | unassigned | Runner docker-auto-scale 72989761 failing |
| 2018-08-17T17:14:45Z | unassigned | Remove broken-ci dashboard from dashboards.gitlab.net |
| 2018-08-14T12:20:08Z | jarv | page the oncall when redis fails over |
| 2018-08-09T16:59:34Z | northrup | Request: Access to security monkey instance for security team |
| 2018-08-03T16:44:51Z | ctbarrett | create alert and dashboard for sidekiq exceptions |
| 2018-07-11T13:25:32Z | unassigned | Configure and run the gitlab pseudonymizer |
| 2018-07-09T06:39:47Z | unassigned | Gitaly p95 latency for nfs-02 is high |
| 2018-07-04T01:27:29Z | unassigned | 2018-07-04 -- High CPU on nfs-file-07 |
| 2018-07-03T05:37:31Z | northrup | Incorrect Storage Shard Key on Projects on GitLab.com |
| 2018-06-27T22:20:51Z | northrup | Read-only elastic cloud user for automated tasks |
| 2018-06-21T23:12:33Z | unassigned | Environments not loading (502) |
| 2018-06-15T11:24:26Z | unassigned | Remove Consul cluster for CI Monitoring from GCP part of the CI infrastructure |
| 2018-06-13T02:38:44Z | unassigned | mtail fsnotify error |
| 2018-06-07T03:33:19Z | unassigned | Issues CSV doesn't export |
| 2018-05-30T13:13:13Z | unassigned | Invalid DKIM for failed pipeline email |
| 2018-04-26T00:16:26Z | unassigned | PostgreSQL_CommitRateTooLow Alert |
| 2018-01-05T15:48:55Z | unassigned | Mailgun Security Breach |
7 Day Issue Stats
- Oncall issues : 7
- Access Request : 2
- Change Issues : 5
- Incident Issues : 6
Change Issues
- 2018-09-04T13:51:07Z - Add haproxy rules to deny abusive paths - unassigned
- 2018-09-04T12:12:20Z - Remove file NFS mounts from gitlab.com - jarv
- 2018-08-28T15:13:27Z - Switch to Sentry on GCP - ahmadsherif
- 2018-08-27T10:22:18Z - Restart the production redis fleet - jarv
- 2018-08-24T05:23:48Z - Postgres: Decrease log_min_duration_statement to 500ms - unassigned
Incident Issues
- 2018-09-04T12:24:44Z - Sometimes GCS returns 5XX (Server error) status code - unassigned | ~S4 | ~"Service:GCP"
- 2018-08-28T17:21:11Z - The URL www.gitlab.com does not work any more - skarbek | ~S3 | ~"Service:Infrastructure"
- 2018-08-28T12:42:58Z - Code diff formatting broken after v11.2.2 deployment on GitLab.com - unassigned | ~S3 | ~"Service:GitLab Rails"
- 2018-08-28T10:18:38Z - Increased error rate on GitLab https due to health check failing on the web fleet - unassigned | ~S3 | ~"Service:GitLab Rails"
- 2018-08-27T15:06:10Z - Loss of public IP for pages will cause an issue for users who have custom domains for pages and an A record - unassigned | ~S3 | ~"Service:Infrastructure"
- 2018-08-21T15:39:01Z - Redis switchover causing increased rate of sidekiq errors - skarbek | ~S4 | ~"Service:Redis" ~"Service:Sidekiq"
This issue was automatically generated using oncall-robot-assistant