OnCall report for period: 2019-05-21 - 2019-05-28
Oncall during this period
Schedule | Username |
---|---|
SRE | Alejandro Rodriguez |
SRE | Amar Amarsanaa |
SRE | Hendrik Meyer |
SRE | Henri Philipps |
PagerDuty Incidents
- Number of incidents: 36
Created | Summary |
---|---|
2019-05-21T12:54:37Z | [8770] Firing 2 - High4xxRateForRegistry |
2019-05-21T16:25:51Z | [8783] Firing 6 - AlertmanagerNotificationsFailing |
2019-05-22T14:15:31Z | [8903] SSO is broken on GitLab.com |
2019-05-23T22:42:57Z | [9030] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-24T01:28:30Z | [9040] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-24T17:00:30Z | [9043] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-24T22:52:28Z | [9046] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-25T01:03:16Z | [9048] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-25T02:52:41Z | [9050] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-25T08:14:27Z | [9053] Firing 1 - |
2019-05-25T20:34:41Z | [9080] Firing 4 - PostgreSQL_ReplicaStaleXmin |
2019-05-26T00:30:33Z | [9092] Firing 2 - SSLCertExpiresSoon |
2019-05-26T03:04:09Z | [9108] Firing 4 - IncreasedBackendConnectionErrors |
2019-05-26T03:04:09Z | [9107] Firing 4 - IncreasedServerConnectionErrors |
2019-05-26T03:04:23Z | [9109] Firing 6 - IncreasedServerResponseErrors |
2019-05-26T04:42:11Z | [9117] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T04:52:09Z | [9119] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T05:02:11Z | [9120] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T05:12:10Z | [9122] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T05:42:10Z | [9126] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T05:52:12Z | [9128] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T06:12:13Z | [9130] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T06:32:13Z | [9134] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T10:52:12Z | [9154] Pingdom check check:https://version.gitlab.com/ is down |
2019-05-26T11:39:25Z | [9157] Firing 1 - Alertmanager is failing sending notications |
2019-05-26T19:05:29Z | [9199] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-26T21:38:51Z | [9212] Firing 4 - IncreasedBackendConnectionErrors |
2019-05-26T21:38:51Z | [9211] Firing 8 - IncreasedServerConnectionErrors |
2019-05-26T21:39:06Z | [9213] Firing 8 - IncreasedServerResponseErrors |
2019-05-27T02:53:27Z | [9240] Firing 10 - PostgreSQL_ReplicaStaleXmin |
2019-05-27T13:23:35Z | [9288] Firing 1 - prometheus is unreachable |
2019-05-27T14:10:53Z | [9293] Firing 1 - 1% disk space left |
2019-05-27T20:17:36Z | [9325] Firing 4 - IncreasedServerResponseErrors |
2019-05-27T20:17:36Z | [9326] Firing 4 - IncreasedBackendConnectionErrors |
2019-05-27T20:17:36Z | [9324] Firing 8 - IncreasedServerConnectionErrors |
2019-05-28T06:30:41Z | [9377] Firing 10 - PostgreSQL_ReplicaStaleXmin |
7 Day Issue Stats
- Oncall issues : 6
- Access Request : 0
- Change Issues : 8
- Incident Issues : 5
- CorrectiveAction Issues : 0
Change Issues
- 2019-05-27T14:23:52Z - Extend disk /dev/sdb on
dashboards-com-01-inf-ops
to 150G - T4cC0re - 2019-05-27T10:27:01Z - Enable feature update_all_mirrors_worker_rescheduling - abrandl
- 2019-05-24T13:00:44Z - Downsize the rest of Sidekiq nodes - ahmadsherif
- 2019-05-24T12:37:40Z - Disable file-25,26,27,28 from being allocated new repos - cmcfarland
- 2019-05-23T22:46:06Z - Apply pending terraform changes - craig
- 2019-05-23T13:38:41Z - Downsize 1 node of each sidekiq type, take 4 - ahmadsherif
- 2019-05-22T21:17:02Z - Implement Salesforce Omniauth integration in gstg and gprd - cmiskell
- 2019-05-22T14:19:03Z - Downsize 1 node of each sidekiq type, take 3 - ahmadsherif
Incident Issues
- 2019-05-27T13:34:10Z - 2019-05-27 Metrics unavailable - unassigned | ~S3 | ~"Service:Prometheus" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/849
- 2019-05-26T12:06:10Z - New messages do not show up in current room - unassigned | ~S2 | ~"Service:Gitter" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/851
- 2019-05-23T16:24:38Z - 2019-05-23 git-over-SSH errors - unassigned | ~S4 | ~"Service:Gitaly" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/844
- 2019-05-23T16:13:24Z - 2019-05-23 SAST jobs failing due to "exec format error" - unassigned | ~S2 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/843
- 2019-05-22T14:21:15Z - SSO not working - unassigned | ~S2 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/840
CorrectiveAction Issues
- 2019-05-24T09:15:43Z - improve incident management tooling - unassigned
- 2019-05-24T08:59:25Z - consolidate incident management documentation - unassigned
Open Issue Stats
- Oncall issues : 13
- Change issues : 9
- Incident issues : 3
- Access Request : 5
- CorrectiveAction : 73
Open Change Issues
Created | Assignee | Summary |
---|---|---|
2019-05-27T10:27:01Z | abrandl | Enable feature update_all_mirrors_worker_rescheduling |
2019-05-23T22:46:06Z | craig | Apply pending terraform changes |
2019-05-17T19:42:27Z | cmcfarland | Enable cron job on NFS server to remove old snippet uploads |
2019-04-30T20:52:53Z | cmcfarland | Re-balance file-28,26,23,27 gitaly node repositories |
2019-04-26T12:09:48Z | abrandl | Increase PostgreSQL work_mem |
2019-04-23T19:56:48Z | mwasilewski-gitlab | enable elasticsearch integration on gitlab.com on gitlab-org namespace |
2019-04-10T20:44:18Z | dawsmith | GCP Sizing recommendations testing April 2019 |
2019-04-01T14:30:13Z | cshobe | Make PostgreSQL autovacuum settings less aggressive |
2019-03-19T17:32:50Z | Finotto | Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2. |
Open Incident Issues
Created | Assignee | Summary |
---|---|---|
2019-05-26T12:06:10Z | unassigned | New messages do not show up in current room |
2019-05-23T16:24:38Z | unassigned | 2019-05-23 git-over-SSH errors |
2019-04-26T07:35:21Z | dosuken123 | Scheduled jobs not triggering |
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
2019-05-23T20:47:23Z | unassigned | Import request (for talentrydev): talentry1 (attempt 2) |
2019-05-09T02:42:21Z | unassigned | Update Salesforce's sandbox credentials for customers.stg.gitlab.com |
2019-04-16T14:14:27Z | unassigned | Unable to deploy https://customers.stg.gitlab.com |
2019-03-12T17:54:13Z | unassigned | Trigger happy alerts are contributing to alert fatigue |
2019-02-19T05:48:37Z | unassigned | Registry Node Didn't Get Drained prior deployment |
2019-01-29T14:08:24Z | unassigned | 2019-01-29 PullMirrorsOverdueQueueTooLarge |
2019-01-25T03:57:42Z | unassigned | Revive https://customers.stg.gitlab.com |
2019-01-23T05:05:06Z | unassigned | When 1 server in a fleet of many goes down, multiple alerts fire |
2019-01-21T22:54:15Z | unassigned | High4xxRateLimit in Staging |
2018-12-27T04:28:28Z | unassigned | prometheus servers in staging are marked as production |
2018-12-04T18:06:34Z | unassigned | Adjust RUBY_GLOBAL_METHOD_CACHE_SIZE on the web fleet |
2018-12-04T15:38:45Z | cmiskell | Add user tracking metrics to dashboards |
2018-11-26T04:18:27Z | northrup | Migrate GitLab.com OAuth2 credentials to gitlab-production project |
This issue was automatically generated using oncall-robot-assistant