OnCall report for period: 2018-08-28 - 2018-09-04

Oncall during this period

Schedule Username
AMA Alejandro Rodriguez
AMA Alex Hanselka
AMA John Skarbek
EU John Northrup
EU Ahmad Sherif
EU Jarv Jarvis

PagerDuty Incidents

  • Number of incidents: 61
Created Summary
2018-08-22T10:36:50Z [2720] Firing 1 - Postgres Replication lag (in bytes) is high
2018-08-22T10:38:13Z [2721] Firing 1 - Postgres seems to be processing very few transactions
2018-08-22T10:38:31Z [2722] Firing 1 - Postgres Replication lag is over 2 minutes
2018-08-22T11:48:09Z [2723] Firing 1 - Postgres seems to be processing very few transactions
2018-08-22T11:53:37Z [2724] Firing 1 - Postgres Replication lag is over 2 minutes
2018-08-22T11:53:37Z [2725] Firing 1 - Postgres seems to be processing very few transactions
2018-08-22T11:56:48Z [2726] Firing 1 - Postgres Replication lag (in bytes) is high
2018-08-23T14:36:32Z [2727] Firing 2 - HighRailsErrorRate
2018-08-23T17:43:27Z [2728] Pingdom check GitLab Infrastructure Master Branch is down
2018-08-24T12:49:21Z [2730] Firing 1 - High Rails Error Rate on Front End
2018-08-24T17:54:30Z [2731] Firing 1 - High Rails Error Rate on Front End
2018-08-24T23:09:24Z [2732] Firing 1 - High Rails Error Rate on Front End
2018-08-25T02:16:03Z [2733] Firing 1 - High Rails Error Rate on Front End
2018-08-26T20:13:04Z [2734] Firing 2 - HighRailsErrorRate
2018-08-26T20:59:46Z [2735] Firing 1 - High Rails Error Rate on Front End
2018-08-27T01:23:31Z [2736] Firing 2 - HighRailsErrorRate
2018-08-27T05:07:22Z [2737] Firing 2 - HighRailsErrorRate
2018-08-27T07:34:21Z [2738] Firing 2 - HighRailsErrorRate
2018-08-27T07:41:10Z [2739] Firing 2 - NoDiskSpace
2018-08-27T09:38:47Z [2740] Firing 4 - HighRailsErrorRate
2018-08-27T10:37:20Z [2741] Firing 4 - HighRailsErrorRate
2018-08-27T10:37:26Z [2742] Pingdom check GitLab.com new repo is down
2018-08-27T10:37:31Z [2743] Pingdom check GitLab Infrastructure Master Branch is down
2018-08-27T10:37:37Z [2744] Pingdom check GitLab.com issue is down
2018-08-27T10:38:20Z [2745] Firing 1 - High Error Rate on Front End Web
2018-08-27T10:54:44Z [2746] Firing 2 - PullMirrorsOverdueQueueTooLarge
2018-08-27T13:34:03Z [2747] Firing 2 - HighRailsErrorRate
2018-08-27T14:21:31Z [2748] Firing 3 - HighRailsErrorRate
2018-08-27T14:30:53Z [2749] Firing 5 - PrometheusUnreachable
2018-08-27T14:41:24Z [2750] Firing 1 - prometheus is backlogging on the notifications queue
2018-08-27T14:51:28Z [2751] Firing 2 - PrometheusNotificationsBacklog
2018-08-27T16:48:02Z [2752] Firing 2 - HighRailsErrorRate
2018-08-27T17:08:54Z [2753] Firing 1 - prometheus is backlogging on the notifications queue
2018-08-27T19:28:16Z [2754] Firing 1 - High Rails Error Rate on Front End
2018-08-27T20:01:02Z [2755] Firing 2 - HighRailsErrorRate
2018-08-27T20:29:15Z [2756] Firing 2 - PullMirrorsOverdueQueueTooLarge
2018-08-27T23:20:32Z [2757] Firing 3 - HighRailsErrorRate
2018-08-27T23:57:29Z [2758] Firing 2 - PullMirrorsOverdueQueueTooLarge
2018-08-28T01:04:22Z [2759] Firing 2 - HighRailsErrorRate
2018-08-28T01:04:32Z [2760] Firing 2 - HighRailsErrorRate
2018-08-28T04:14:05Z [2761] Firing 2 - HighRailsErrorRate
2018-08-28T04:41:03Z [2762] Firing 2 - HighRailsErrorRate
2018-08-28T09:23:30Z [2763] Pingdom check GitLab.com issue is down
2018-08-28T09:23:34Z [2764] Pingdom check GitLab Infrastructure Master Branch is down
2018-08-28T09:23:43Z [2765] Pingdom check GitLab.com public check is down
2018-08-28T09:23:49Z [2766] Pingdom check GitLab.com new repo is down
2018-08-28T09:24:22Z [2767] Firing 1 - High Error Rate on Front End Web
2018-08-28T09:24:26Z [2768] Firing 1 - The alert test file is missing!
2018-08-28T16:17:32Z [2770] Firing 1 - High Rails Error Rate on Front End
2018-08-29T13:05:24Z [2774] Firing 1 - prometheus is backlogging on the notifications queue
2018-08-30T16:56:00Z [2776] Firing 2 - PostgreSQL_XLOGConsumptionTooLow
2018-08-31T18:30:41Z [2780] Firing 2 - FeLoadBalancerLossOfRedundancy
2018-08-31T18:55:40Z [2781] Firing 1 - Loss of Redundancy
2018-09-01T08:57:17Z [2782] Firing 4 - HighRailsErrorRate
2018-09-03T13:14:32Z [2783] Firing 4 - HighRailsErrorRate
2018-09-03T13:14:50Z [2784] Pingdom check GitLab.com new repo is down
2018-09-03T16:05:53Z [2785] Firing 6 - PrometheusUnreachable
2018-09-04T07:22:32Z [2786] Firing 4 - HighRailsErrorRate
2018-09-04T07:37:43Z [2787] Firing 1 - High Rails Error Rate on Front End
2018-09-04T09:04:02Z [2788] Firing 4 - HighRailsErrorRate
2018-09-04T09:15:19Z [2789] Firing 3 - HighRailsErrorRate

Open Issue Stats

Open Change Issues

Created Assignee Summary
2018-09-04T12:12:20Z jarv Remove file NFS mounts from gitlab.com
2018-08-28T15:13:27Z ahmadsherif Switch to Sentry on GCP
2018-08-24T05:23:48Z unassigned Postgres: Decrease log_min_duration_statement to 500ms
2018-08-20T15:58:02Z unassigned Raise alert threshold for wale-replica
2018-08-11T12:48:09Z abrandl Increase max_replication_slots

Open Incident Issues

Created Assignee Summary
2018-09-04T12:24:44Z unassigned Sometimes GCS returns 5XX (Server error) status code
2018-08-28T10:18:38Z unassigned Increased error rate on GitLab https due to health check failing on the web fleet
2018-08-20T14:01:24Z unassigned Diminishing of logging visibility over the weekend (18 & 19 Aug)
2018-08-17T17:16:59Z unassigned Incident Working doc: 2018-08-17
2018-08-13T10:05:43Z unassigned 2018-08-13: Failing API Health Checks
2018-08-10T21:34:43Z unassigned 2018-08-10: High rate of 500 errors on API nodes

Open Oncall Issues

Created Assignee Summary
2018-09-04T13:51:07Z unassigned Add haproxy rules to deny abusive paths
2018-08-25T17:58:06Z unassigned Import Failures - amp-robotics/migration-testing/axon and ledgerx/core-issues-archive/
2018-08-25T04:53:02Z unassigned Runner docker-auto-scale 72989761 failing
2018-08-17T17:14:45Z unassigned Remove broken-ci dashboard from dashboards.gitlab.net
2018-08-14T12:20:08Z jarv page the oncall when redis fails over
2018-08-09T16:59:34Z northrup Request: Access to security monkey instance for security team
2018-08-03T16:44:51Z ctbarrett create alert and dashboard for sidekiq exceptions
2018-07-11T13:25:32Z unassigned Configure and run the gitlab pseudonymizer
2018-07-09T06:39:47Z unassigned Gitaly p95 latency for nfs-02 is high
2018-07-04T01:27:29Z unassigned 2018-07-04 -- High CPU on nfs-file-07
2018-07-03T05:37:31Z northrup Incorrect Storage Shard Key on Projects on GitLab.com
2018-06-27T22:20:51Z northrup Read-only elastic cloud user for automated tasks
2018-06-21T23:12:33Z unassigned Environments not loading (502)
2018-06-15T11:24:26Z unassigned Remove Consul cluster for CI Monitoring from GCP part of the CI infrastructure
2018-06-13T02:38:44Z unassigned mtail fsnotify error
2018-06-07T03:33:19Z unassigned Issues CSV doesn't export
2018-05-30T13:13:13Z unassigned Invalid DKIM for failed pipeline email
2018-04-26T00:16:26Z unassigned PostgreSQL_CommitRateTooLow Alert
2018-01-05T15:48:55Z unassigned Mailgun Security Breach

7 Day Issue Stats

  • Oncall issues : 7
  • Access Request : 2
  • Change Issues : 5
  • Incident Issues : 6

Change Issues

Incident Issues

This issue was automatically generated using oncall-robot-assistant