Skip to content

OnCall report for period: 2019-05-21 - 2019-05-28

Oncall during this period

Schedule Username
SRE Alejandro Rodriguez
SRE Amar Amarsanaa
SRE Hendrik Meyer
SRE Henri Philipps

PagerDuty Incidents

  • Number of incidents: 36
Created Summary
2019-05-21T12:54:37Z [8770] Firing 2 - High4xxRateForRegistry
2019-05-21T16:25:51Z [8783] Firing 6 - AlertmanagerNotificationsFailing
2019-05-22T14:15:31Z [8903] SSO is broken on GitLab.com
2019-05-23T22:42:57Z [9030] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-24T01:28:30Z [9040] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-24T17:00:30Z [9043] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-24T22:52:28Z [9046] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-25T01:03:16Z [9048] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-25T02:52:41Z [9050] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-25T08:14:27Z [9053] Firing 1 -
2019-05-25T20:34:41Z [9080] Firing 4 - PostgreSQL_ReplicaStaleXmin
2019-05-26T00:30:33Z [9092] Firing 2 - SSLCertExpiresSoon
2019-05-26T03:04:09Z [9108] Firing 4 - IncreasedBackendConnectionErrors
2019-05-26T03:04:09Z [9107] Firing 4 - IncreasedServerConnectionErrors
2019-05-26T03:04:23Z [9109] Firing 6 - IncreasedServerResponseErrors
2019-05-26T04:42:11Z [9117] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T04:52:09Z [9119] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T05:02:11Z [9120] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T05:12:10Z [9122] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T05:42:10Z [9126] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T05:52:12Z [9128] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T06:12:13Z [9130] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T06:32:13Z [9134] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T10:52:12Z [9154] Pingdom check check:https://version.gitlab.com/ is down
2019-05-26T11:39:25Z [9157] Firing 1 - Alertmanager is failing sending notications
2019-05-26T19:05:29Z [9199] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-26T21:38:51Z [9212] Firing 4 - IncreasedBackendConnectionErrors
2019-05-26T21:38:51Z [9211] Firing 8 - IncreasedServerConnectionErrors
2019-05-26T21:39:06Z [9213] Firing 8 - IncreasedServerResponseErrors
2019-05-27T02:53:27Z [9240] Firing 10 - PostgreSQL_ReplicaStaleXmin
2019-05-27T13:23:35Z [9288] Firing 1 - prometheus is unreachable
2019-05-27T14:10:53Z [9293] Firing 1 - 1% disk space left
2019-05-27T20:17:36Z [9325] Firing 4 - IncreasedServerResponseErrors
2019-05-27T20:17:36Z [9326] Firing 4 - IncreasedBackendConnectionErrors
2019-05-27T20:17:36Z [9324] Firing 8 - IncreasedServerConnectionErrors
2019-05-28T06:30:41Z [9377] Firing 10 - PostgreSQL_ReplicaStaleXmin

7 Day Issue Stats

  • Oncall issues : 6
  • Access Request : 0
  • Change Issues : 8
  • Incident Issues : 5
  • CorrectiveAction Issues : 0

Change Issues

Incident Issues

CorrectiveAction Issues

Open Issue Stats

Open Change Issues

Created Assignee Summary
2019-05-27T10:27:01Z abrandl Enable feature update_all_mirrors_worker_rescheduling
2019-05-23T22:46:06Z craig Apply pending terraform changes
2019-05-17T19:42:27Z cmcfarland Enable cron job on NFS server to remove old snippet uploads
2019-04-30T20:52:53Z cmcfarland Re-balance file-28,26,23,27 gitaly node repositories
2019-04-26T12:09:48Z abrandl Increase PostgreSQL work_mem
2019-04-23T19:56:48Z mwasilewski-gitlab enable elasticsearch integration on gitlab.com on gitlab-org namespace
2019-04-10T20:44:18Z dawsmith GCP Sizing recommendations testing April 2019
2019-04-01T14:30:13Z cshobe Make PostgreSQL autovacuum settings less aggressive
2019-03-19T17:32:50Z Finotto Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2.

Open Incident Issues

Created Assignee Summary
2019-05-26T12:06:10Z unassigned New messages do not show up in current room
2019-05-23T16:24:38Z unassigned 2019-05-23 git-over-SSH errors
2019-04-26T07:35:21Z dosuken123 Scheduled jobs not triggering

Open Oncall Issues

Created Assignee Summary
2019-05-23T20:47:23Z unassigned Import request (for talentrydev): talentry1 (attempt 2)
2019-05-09T02:42:21Z unassigned Update Salesforce's sandbox credentials for customers.stg.gitlab.com
2019-04-16T14:14:27Z unassigned Unable to deploy https://customers.stg.gitlab.com
2019-03-12T17:54:13Z unassigned Trigger happy alerts are contributing to alert fatigue
2019-02-19T05:48:37Z unassigned Registry Node Didn't Get Drained prior deployment
2019-01-29T14:08:24Z unassigned 2019-01-29 PullMirrorsOverdueQueueTooLarge
2019-01-25T03:57:42Z unassigned Revive https://customers.stg.gitlab.com
2019-01-23T05:05:06Z unassigned When 1 server in a fleet of many goes down, multiple alerts fire
2019-01-21T22:54:15Z unassigned High4xxRateLimit in Staging
2018-12-27T04:28:28Z unassigned prometheus servers in staging are marked as production
2018-12-04T18:06:34Z unassigned Adjust RUBY_GLOBAL_METHOD_CACHE_SIZE on the web fleet
2018-12-04T15:38:45Z cmiskell Add user tracking metrics to dashboards
2018-11-26T04:18:27Z northrup Migrate GitLab.com OAuth2 credentials to gitlab-production project

This issue was automatically generated using oncall-robot-assistant