OnCall report for period: 2018-09-04 - 2018-09-11
Oncall during this period
Schedule | Username |
---|---|
AMA | John Skarbek |
EU | Ahmad Sherif |
EU | Jarv Jarvis |
PagerDuty Incidents
- Number of incidents: 41
Created | Summary |
---|---|
2018-09-04T21:35:47Z | [2791] Firing 2 - HighRailsErrorRate |
2018-09-04T21:36:03Z | [2792] Firing 4 - HighRailsErrorRate |
2018-09-04T23:49:02Z | [2793] Firing 4 - HighRailsErrorRate |
2018-09-05T00:25:19Z | [2794] Firing 4 - HighRailsErrorRate |
2018-09-05T01:23:32Z | [2795] Firing 4 - HighRailsErrorRate |
2018-09-05T02:44:02Z | [2796] Firing 4 - HighRailsErrorRate |
2018-09-05T02:54:01Z | [2797] Firing 3 - HighRailsErrorRate |
2018-09-05T03:20:01Z | [2798] Firing 4 - HighRailsErrorRate |
2018-09-05T03:40:01Z | [2799] Firing 4 - HighRailsErrorRate |
2018-09-05T03:57:46Z | [2800] Firing 4 - HighRailsErrorRate |
2018-09-05T04:21:32Z | [2801] Firing 4 - HighRailsErrorRate |
2018-09-05T04:46:32Z | [2802] Firing 1 - High Rails Error Rate on Front End |
2018-09-05T05:03:01Z | [2803] Firing 4 - HighRailsErrorRate |
2018-09-05T05:28:02Z | [2804] Firing 3 - HighRailsErrorRate |
2018-09-05T08:44:59Z | [2805] Firing 2 - PostgreSQL_XLOGConsumptionTooHigh |
2018-09-05T09:06:59Z | [2806] Pingdom check GitLab.com Pages is down |
2018-09-05T10:00:45Z | [2808] Firing 1 - Postgres is generating XLOG too fast, expect this to cause replication lag |
2018-09-05T11:04:29Z | [2809] Firing 2 - PostgreSQL_XLOGConsumptionTooHigh |
2018-09-05T13:11:36Z | [2810] Firing 2 - PrometheusNotificationsBacklog |
2018-09-05T18:24:07Z | [2812] Firing 2 - RegistryDown |
2018-09-05T21:42:59Z | [2813] Firing 2 - PostgreSQL_ReplicationLagTooLarge_ArchiveReplica |
2018-09-07T01:40:12Z | [2815] Firing 1 - 1% disk space left |
2018-09-07T01:50:33Z | [2816] Firing 1 - High Rails Error Rate on Front End |
2018-09-07T02:36:18Z | [2817] Firing 4 - HighRailsErrorRate |
2018-09-07T07:08:00Z | [2818] Firing 2 - PostgreSQL_ReplicationLagTooLarge_ArchiveReplica |
2018-09-07T10:13:10Z | [2819] Firing 2 - PostgreSQL_UnusedReplicationSlot |
2018-09-07T10:23:59Z | [2820] Firing 2 - PostgreSQL_SplitBrain_Replicas |
2018-09-07T22:09:31Z | [2821] Firing 2 - HighRailsErrorRate |
2018-09-07T23:13:20Z | [2822] Firing 4 - HighRailsErrorRate |
2018-09-08T00:14:46Z | [2823] Firing 3 - HighRailsErrorRate |
2018-09-08T00:42:20Z | [2824] Firing 4 - HighRailsErrorRate |
2018-09-08T01:11:46Z | [2825] Firing 4 - HighRailsErrorRate |
2018-09-08T01:26:46Z | [2826] Firing 4 - HighRailsErrorRate |
2018-09-08T01:37:03Z | [2827] Firing 4 - HighRailsErrorRate |
2018-09-09T12:46:39Z | [2829] Firing 1 - 1% disk space left |
2018-09-10T14:47:45Z | [2830] Firing 2 - PullMirrorsOverdueQueueTooLarge |
2018-09-10T16:49:51Z | [2831] Firing 2 - PrometheusUnreachable |
2018-09-11T07:05:16Z | [2832] Firing 2 - PrometheusUnreachable |
2018-09-11T12:10:46Z | [2833] Pingdom check GitLab Forum is down |
2018-09-11T12:12:41Z | [2834] Firing 2 - PullMirrorsOverdueQueueTooLarge |
2018-09-11T13:49:24Z | [2835] Firing 4 - HighRailsErrorRate |
Open Issue Stats
- Oncall issues : 12
- Change issues : 3
- Incident issues : 7
- Access Request : 6
Open Change Issues
Created | Assignee | Summary |
---|---|---|
2018-08-28T15:13:27Z | ahmadsherif | Switch to Sentry on GCP |
2018-08-24T05:23:48Z | unassigned | Postgres: Decrease log_min_duration_statement to 500ms |
2018-08-11T12:48:09Z | unassigned | Increase max_replication_slots
|
Open Incident Issues
Created | Assignee | Summary |
---|---|---|
2018-09-11T06:57:20Z | unassigned | API not available on git push for some users |
2018-09-10T15:21:54Z | unassigned | Large backlog of pullmirrors on GitLab.com 2018-09-10 and 2018-09-11 |
2018-09-06T19:48:27Z | skarbek | Outbound Email from GitLab.com was failing for approximately 30 minutes on GitLab.com |
2018-09-06T12:56:58Z | unassigned | Site degradation when deploying to gitlab.com (deploy tooling) |
2018-09-04T12:24:44Z | unassigned | Sometimes GCS returns 5XX (Server error) status code |
2018-08-28T10:18:38Z | unassigned | Increased error rate on GitLab https due to health check failing on the web fleet |
2018-08-10T21:34:43Z | unassigned | 2018-08-10: High rate of 500 errors on API nodes |
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
2018-09-10T12:02:13Z | unassigned | Rename group results project 404 |
2018-09-07T21:59:35Z | jarv | Marvin access for Tristan and Jerome |
2018-09-06T10:58:58Z | unassigned | Email user list for ending GitLab.com Early Adopter program |
2018-09-05T03:58:30Z | felipe_artur | Brute force scanners may be affecting error rates |
2018-08-25T17:58:06Z | unassigned | Import Failures - amp-robotics/migration-testing/axon and ledgerx/core-issues-archive/ |
2018-08-14T12:20:08Z | dsylva | page the oncall when redis fails over |
2018-08-03T16:44:51Z | ctbarrett | create alert and dashboard for sidekiq exceptions |
2018-07-11T13:25:32Z | unassigned | Configure and run the gitlab pseudonymizer |
2018-07-03T05:37:31Z | northrup | Incorrect Storage Shard Key on Projects on GitLab.com |
2018-06-27T22:20:51Z | northrup | Read-only elastic cloud user for automated tasks |
2018-06-21T23:12:33Z | skarbek | Environments not loading (502) |
2018-06-07T03:33:19Z | skarbek | Issues CSV doesn't export |
7 Day Issue Stats
- Oncall issues : 9
- Access Request : 8
- Change Issues : 4
- Incident Issues : 7
Change Issues
- 2018-09-07T08:01:19Z - Rebuild postgres-01 replica - abrandl
- 2018-09-05T21:43:36Z - Remove Legacy DNS Entries from gitlab.com. Zone - northrup
- 2018-09-05T18:01:11Z - Enable File 21,22,23,24 to be default storage - ahanselka
- 2018-09-05T16:30:45Z - Use a dedicated LB for registry - ahmadsherif
Incident Issues
- 2018-09-11T06:57:20Z - API not available on git push for some users - unassigned | ~S3 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/463
- 2018-09-10T15:21:54Z - Large backlog of pullmirrors on GitLab.com 2018-09-10 and 2018-09-11 - unassigned | ~S3 | ~"Service:Sidekiq" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/462
- 2018-09-08T02:55:59Z - 2018-09-07 We let the docs.gitlab.com SSL certificate expire - unassigned | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/459
- 2018-09-06T19:48:27Z - Outbound Email from GitLab.com was failing for approximately 30 minutes on GitLab.com - skarbek | ~S3 | ~"Service:Infrastructure" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/457
- 2018-09-06T12:56:58Z - Site degradation when deploying to gitlab.com (deploy tooling) - unassigned | | ~"Service:takeoff" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/456
- 2018-09-05T09:36:17Z - High update rate on
projects
table causes replication lag - abrandl | ~S4 | ~"Service:Postgres" ~"Service:Sidekiq" |https://gitlab.com/gitlab-com/gl-infra/production/issues/451
- 2018-09-05T09:27:31Z - 5 minute GitLab pages outage during deploy - unassigned | ~S3 | ~"Service:Pages" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/450
This issue was automatically generated using oncall-robot-assistant