OnCall report for period: 2019-04-02 - 2019-04-09
Oncall during this period
Schedule | Username |
---|---|
SRE | Ahmad Sherif |
SRE | Devin Sylva |
SRE | Hendrik Meyer |
SRE | Cameron McFarland |
PagerDuty Incidents
- Number of incidents: 44
Created | Summary |
---|---|
2019-04-02T21:28:51Z | [6604] Firing 4 - IncreasedErrorRateOtherBackends |
2019-04-02T22:09:36Z | [6605] Firing 4 - IncreasedErrorRateOtherBackends |
2019-04-03T03:27:15Z | [6608] Firing 2 - PatroniIsDown |
2019-04-03T04:01:43Z | [6609] Firing 2 - PatroniIsDown |
2019-04-03T04:15:43Z | [6610] Firing 2 - WALEBackupDelayed |
2019-04-03T08:09:06Z | [6612] Firing 2 - GitalyErrorRateTooHigh |
2019-04-04T19:14:26Z | [6642] Firing 2 - PostgreSQL_TooManyDeadTuples |
2019-04-04T20:13:51Z | [6643] Firing 2 - Detecting5xxForRegistry |
2019-04-05T01:27:36Z | [6653] Firing 1 - Alertmanager is failing sending notications |
2019-04-05T01:29:35Z | [6654] Firing 2 - AlertmanagerNotificationsFailing |
2019-04-05T01:48:20Z | [6655] Firing 1 - Alertmanager is failing sending notications |
2019-04-05T01:57:21Z | [6656] Firing 2 - AlertmanagerNotificationsFailing |
2019-04-05T02:02:07Z | [6657] Firing 1 - Alertmanager is failing sending notications |
2019-04-05T02:02:35Z | [6658] Firing 1 - Alertmanager is failing sending notications |
2019-04-05T02:07:23Z | [6659] Firing 2 - AlertmanagerNotificationsFailing |
2019-04-05T02:17:06Z | [6661] Firing 1 - Alertmanager is failing sending notications |
2019-04-05T02:28:06Z | [6662] Firing 2 - AlertmanagerNotificationsFailing |
2019-04-05T03:19:50Z | [6665] Firing 2 - GitLabComLatencyWebCritical |
2019-04-05T08:01:10Z | [6669] Firing 32 - IncreasedServerResponseErrors |
2019-04-05T10:41:51Z | [6673] Firing 2 - GitLabComLatencyWebCritical |
2019-04-05T11:05:51Z | [6675] Firing 2 - GitLabComLatencyWebCritical |
2019-04-05T11:10:07Z | [6676] Firing 2 - Detecting5xxForRegistry |
2019-04-05T11:33:48Z | [6677] Firing 2 - walgBackupDelayed |
2019-04-05T13:16:07Z | [6680] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-05T13:27:35Z | [6681] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-05T13:40:19Z | [6682] Firing 2 - Detecting5xxForRegistry |
2019-04-05T13:44:50Z | [6683] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-05T13:56:36Z | [6684] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-05T13:56:51Z | [6685] Firing 2 - Detecting5xxForRegistry |
2019-04-05T14:07:50Z | [6686] Firing 2 - Detecting5xxForRegistry |
2019-04-05T14:15:52Z | [6687] Pingdom check check:https://forum.gitlab.com/ is down |
2019-04-05T17:49:20Z | [6690] Firing 2 - Detecting5xxForRegistry |
2019-04-06T08:05:58Z | [6692] Approval for ES cluster expansion |
2019-04-07T06:17:22Z | [6697] Firing 6 - IncreasedServerResponseErrors |
2019-04-07T11:06:20Z | [6699] Firing 4 - HAProxyHighCPU |
2019-04-08T04:54:50Z | [6703] Firing 2 - IncreasedServerResponseErrors |
2019-04-08T09:38:51Z | [6705] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-08T09:43:51Z | [6706] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-08T09:48:50Z | [6707] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-08T09:49:58Z | [6708] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-ce/tree/master is down |
2019-04-08T09:54:22Z | [6709] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-08T10:03:52Z | [6710] Firing 2 - IncreasedErrorRateOtherBackends |
2019-04-08T12:30:14Z | [6711] Firing 2 - ChefClientErrorCritical |
2019-04-09T04:45:06Z | [6719] Firing 2 - IncreasedServerResponseErrors |
7 Day Issue Stats
- Oncall issues : 1
- Access Request : 0
- Change Issues : 6
- Incident Issues : 5
- CorrectiveAction Issues : 0
Change Issues
- 2019-04-08T21:22:47Z - Provision a DB replica backed by ZFS - ahmadsherif
- 2019-04-08T17:43:01Z - Database repacking - take 3 - abrandl
- 2019-04-08T11:19:16Z - Set group runner tokens to zero - hphilipps
- 2019-04-04T12:11:12Z - Database repacking - take 2 - abrandl
- 2019-04-03T16:42:32Z - update Fastly config and redirects scripts in www-gitlab-com to handle different types of targets - mwasilewski-gitlab
- 2019-04-02T22:23:17Z - Clear leftover data from upload migration from share-01 - ahanselka
Incident Issues
- 2019-04-06T01:02:43Z - 2019-04-06 Group runner tokens exposure - unassigned | ~S1 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/767
- 2019-04-05T17:59:00Z - 2019-04-05: constant Registry 5xx errors - ahmadsherif | ~S4 | ~"Service:Registry" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/766
- 2019-04-05T17:19:12Z - Production Patch 11.9.6 - skarbek | ~S1 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/765
- 2019-04-02T21:56:23Z - 2019-04-02: api service error alert - dawsmith | ~S3 | ~"Service:API" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/759
- 2019-04-02T16:31:32Z - Slow Pipelines due to abuse - unassigned | ~S3 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/757
CorrectiveAction Issues
- 2019-04-04T20:27:33Z - Add alerts for osqueryd metrics - unassigned
- 2019-04-04T20:22:07Z - Define a production-readiness process for services - unassigned
Open Issue Stats
- Oncall issues : 15
- Change issues : 5
- Incident issues : 3
- Access Request : 5
- CorrectiveAction : 66
Open Change Issues
Created | Assignee | Summary |
---|---|---|
2019-04-08T21:22:47Z | ahmadsherif | Provision a DB replica backed by ZFS |
2019-04-08T17:43:01Z | abrandl | Database repacking - take 3 |
2019-04-01T14:30:13Z | cshobe | Make PostgreSQL autovacuum settings less aggressive |
2019-03-21T08:55:22Z | ahanselka | Sanitize images on gitlab.com using a rake task |
2019-03-19T17:32:50Z | Finotto | Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2. |
Open Incident Issues
Created | Assignee | Summary |
---|---|---|
2019-04-06T01:02:43Z | unassigned | 2019-04-06 Group runner tokens exposure |
2019-04-05T17:59:00Z | ahmadsherif | 2019-04-05: constant Registry 5xx errors |
2019-04-02T16:31:32Z | unassigned | Slow Pipelines due to abuse |
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
2019-03-22T10:00:02Z | unassigned | redash.gitlab.com SSL certificate expired |
2019-03-21T22:36:56Z | unassigned | Slow rendering of pages for certain users |
2019-03-21T11:40:04Z | hphilipps | osquery is filling up the root fs |
2019-03-18T10:32:45Z | aamarsanaa | HAProxy alerts should fire based on error-rate rather than static value |
2019-03-14T17:43:33Z | unassigned | Certificate for redash.gitlab.com is expired |
2019-03-12T17:54:13Z | aamarsanaa | Trigger happy alerts are contributing to alert fatigue |
2019-02-19T05:48:37Z | unassigned | Registry Node Didn't Get Drained prior deployment |
2019-01-29T14:08:24Z | unassigned | 2019-01-29 PullMirrorsOverdueQueueTooLarge |
2019-01-25T03:57:42Z | unassigned | Revive https://customers.stg.gitlab.com |
2019-01-23T05:05:06Z | unassigned | When 1 server in a fleet of many goes down, multiple alerts fire |
2019-01-21T22:54:15Z | unassigned | High4xxRateLimit in Staging |
2018-12-27T04:28:28Z | unassigned | prometheus servers in staging are marked as production |
2018-12-04T18:06:34Z | unassigned | Adjust RUBY_GLOBAL_METHOD_CACHE_SIZE on the web fleet |
2018-12-04T15:38:45Z | unassigned | Add user tracking metrics to dashboards |
2018-11-26T04:18:27Z | northrup | Migrate GitLab.com OAuth2 credentials to gitlab-production project |
This issue was automatically generated using oncall-robot-assistant