OnCall report for period: 2019-07-09 - 2019-07-16
Oncall during this period
Schedule | Username |
---|---|
SRE | Ahmad Sherif |
SRE | Alex Hanselka |
SRE | Craig Barrett |
SRE | Michal Wasilewski |
PagerDuty Incidents
- Number of incidents: 48
Created | Summary |
---|---|
2019-07-09T08:52:21Z | [12252] Firing 1 - Large number of overdue pull mirror jobs: 7787 |
2019-07-09T10:01:36Z | [12259] Firing 1 - Large number of overdue pull mirror jobs: 15487.5 |
2019-07-09T15:02:22Z | [12295] Firing 1 - Large number of overdue pull mirror jobs: 9160 |
2019-07-09T15:31:24Z | [12300] Firing 2 - ExtremelyLowDiskSpace |
2019-07-09T18:08:51Z | [12314] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-07-10T06:45:26Z | [12347] Firing 2 - IncreasedServerResponseErrors |
2019-07-10T14:42:07Z | [12391] Firing 1 - Large number of overdue pull mirror jobs: 8249.5 |
2019-07-10T15:43:37Z | [12401] Firing 1 - Large number of overdue pull mirror jobs: 9798.5 |
2019-07-10T16:14:09Z | [12411] Firing 1 - Large number of overdue pull mirror jobs: 5970 |
2019-07-10T16:29:09Z | [12416] Firing 1 - Large number of overdue pull mirror jobs: 11227.5 |
2019-07-10T23:12:13Z | [12435] Firing 1 - Increased Server Response Errors |
2019-07-11T11:10:31Z | [12479] Firing 1 - Postgres seems to be processing very few transactions |
2019-07-11T11:14:27Z | [12480] Firing 1 - patroni-06-db-gprd.c.gitlab-production.internal postgres service appears down |
2019-07-11T11:18:44Z | [12481] Firing 1 - Patroni is down |
2019-07-11T12:14:27Z | [12487] Firing 1 - patroni-07-db-gprd.c.gitlab-production.internal postgres service appears down |
2019-07-11T12:38:42Z | [12490] Firing 1 - Unused Replication Slots for patroni-04-db-gprd.c.gitlab-production.internal |
2019-07-11T13:02:51Z | [12494] Firing 1 - Large number of overdue pull mirror jobs: 7355 |
2019-07-11T13:13:29Z | [12498] Firing 1 - Postgres exporter is showing errors for the last hour |
2019-07-11T13:29:27Z | [12501] Firing 2 - PostgreSQL_ServiceDown |
2019-07-11T13:46:27Z | [12502] Firing 1 - Large number of overdue pull mirror jobs: 9320 |
2019-07-11T16:16:57Z | [12512] Firing 1 - Large number of overdue pull mirror jobs: 10561 |
2019-07-11T23:09:36Z | [12522] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-ce/issues is down |
2019-07-11T23:09:41Z | [12523] Pingdom check check:https://gitlab.com/api/v4/projects/13083 is down |
2019-07-11T23:50:41Z | [12524] Firing 1 - PostgreSQL dead tuples is too large |
2019-07-12T04:15:38Z | [12528] Pingdom check check:https://license.gitlab.com/users/sign_in is down |
2019-07-12T11:53:06Z | [12536] Firing 3 - IncreasedServerResponseErrors |
2019-07-12T13:20:23Z | [12539] Firing 5 - IncreasedServerResponseErrors |
2019-07-12T13:33:43Z | [12541] Firing 1 - Patroni is down |
2019-07-12T13:34:27Z | [12542] Firing 1 - patroni-06-db-gprd.c.gitlab-production.internal postgres service appears down |
2019-07-12T14:47:24Z | [12550] Firing 1 - Large number of overdue pull mirror jobs: 7058 |
2019-07-14T04:02:15Z | [12746] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T04:52:12Z | [12754] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T09:52:15Z | [12778] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T18:42:12Z | [12834] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T19:02:13Z | [12838] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T19:22:13Z | [12841] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T19:32:15Z | [12842] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T19:52:12Z | [12845] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T20:12:19Z | [12848] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T20:22:16Z | [12849] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T20:42:10Z | [12853] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T20:52:16Z | [12855] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-14T21:12:09Z | [12856] Pingdom check check:https://version.gitlab.com/ is down |
2019-07-15T03:58:05Z | [12902] Firing 1 - Gitaly error rate is too high: 6.02 |
2019-07-15T13:06:51Z | [12969] Firing 1 - The haproxy service is less available than normal |
2019-07-15T20:41:07Z | [13020] Firing 1 - Gitaly error rate is too high: 6.82 |
2019-07-15T21:07:12Z | [13025] Firing 1 - The haproxy service is less available than normal |
2019-07-16T03:40:14Z | [13079] Firing 1 - 1% disk space left |
7 Day Issue Stats
- Oncall issues : 1
- Access Request : 0
- Change Issues : 9
- Incident Issues : 4
- CorrectiveAction Issues : 0
Change Issues
- 2019-07-16T07:50:32Z - Clean-up <code data-sourcepos="74:37-74:48">*+moved*.git</code> repositories from file servers - aamarsanaa
- 2019-07-16T01:28:32Z - Build/cutover to new Redis-Sidekiq instances - craig
- 2019-07-15T23:06:50Z - Apply rate-limit sessions to haproxy ssh front-ends - cmiskell
- 2019-07-15T20:59:03Z - Upgrade ruby on license.gitlab.com - craig
- 2019-07-13T01:03:58Z - WIP: Prevent residual HAProxy processes by setting
hard-stop-after
- msmiley - 2019-07-12T13:07:49Z - WIP: Minor upgrade for postgres from 9.6.11 to 9.6.14 - abrandl
- 2019-07-11T16:14:42Z - Modify capacity of web worker counts and web nodes - bjk-gitlab
- 2019-07-10T17:03:47Z - Re-route staging.gitlab.com traffic through CloudFlare's CDN - alejandro
- 2019-07-10T05:00:42Z - Apply MaxStartups change for ssh on git servers - cmiskell
Incident Issues
- 2019-07-15T15:18:28Z - saturation of pipeline sidekiq workers - unassigned | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/959
- 2019-07-12T15:52:54Z - license.gitlab.com was rebooted by AWS - unassigned | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/955
- 2019-07-12T14:08:32Z - logs not available in ELK for a number of indexes - unassigned | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/954
- 2019-07-11T05:31:21Z - Postgres instance seemingly corrupted - ahmadsherif | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/948
CorrectiveAction Issues
- 2019-07-10T09:52:11Z - Refresh staging environment with production data volume - unassigned
- 2019-07-09T08:59:27Z - Split Redis-Sidekiq from Redis-Persistent - ansdval
Open Issue Stats
- Oncall issues : 18
- Change issues : 14
- Incident issues : 1
- Access Request : 4
- CorrectiveAction : 76
Open Change Issues
Created | Assignee | Summary |
---|---|---|
2019-07-16T07:50:32Z | aamarsanaa | Clean-up *+moved*.git repositories from file servers |
2019-07-16T01:28:32Z | craig | Build/cutover to new Redis-Sidekiq instances |
2019-07-15T20:59:03Z | craig | Upgrade ruby on license.gitlab.com |
2019-07-13T01:03:58Z | msmiley | WIP: Prevent residual HAProxy processes by setting hard-stop-after
|
2019-07-12T13:07:49Z | abrandl | WIP: Minor upgrade for postgres from 9.6.11 to 9.6.14 |
2019-07-09T03:15:10Z | cmiskell | Enable camoproxy functionality |
2019-07-05T09:19:58Z | unassigned | Enable L1 caching on a Production canary instance for ~10 minutes |
2019-07-02T20:54:46Z | devin | Migrate to Hashed Storage from legacy project storage |
2019-06-21T12:55:43Z | unassigned | Tune down idle_in_transaction_session_timeout |
2019-06-19T06:59:56Z | unassigned | Implement pipeline quotas on GitLab.com |
2019-06-07T16:08:53Z | aamarsanaa | add more storage nodes and rebalance existing ones |
2019-06-05T12:26:50Z | unassigned | Cleanup unused postgres config files for patroni instances |
2019-04-10T20:44:18Z | dawsmith | GCP Sizing recommendations testing April 2019 |
2019-03-19T17:32:50Z | Finotto | Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2. |
Open Incident Issues
Created | Assignee | Summary |
---|---|---|
2019-07-01T21:39:08Z | unassigned | Cred stuffers are back since 2019-06-28 |
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
2019-07-02T16:30:54Z | hphilipps | RCA: Degraded performance because of Redis-cache overload. |
2019-07-01T16:03:52Z | unassigned | Git push is slow in gprd |
2019-07-01T09:58:51Z | ahanselka | Collect CI/CD Pipelines data from GitLab.com |
2019-06-18T09:15:35Z | unassigned | Create alert for registry latency or memory |
2019-06-13T13:41:47Z | unassigned | Problems with Git garbage collection on gitlab.com |
2019-06-11T21:05:13Z | unassigned | Elevated rates of internal API failing |
2019-06-11T15:01:42Z | ahanselka | Update build-runner s3 settings before they lose support in version 12.0 of GitLab runner. |
2019-06-11T11:30:31Z | unassigned | no alert for customer.gitlab.com being down |
2019-06-10T11:23:36Z | unassigned | investigate and potentially adjust prediction based alerting rules for filesystems being full soon (follow up on root filesystem being full on influxdb nodes) |
2019-06-05T10:28:40Z | mwasilewski-gitlab | RCA for 2019-06-05 increased response times from rails |
2019-04-16T14:14:27Z | unassigned | Unable to deploy https://customers.stg.gitlab.com |
2019-03-12T17:54:13Z | unassigned | Trigger happy alerts are contributing to alert fatigue |
2019-02-19T05:48:37Z | unassigned | Registry Node Didn't Get Drained prior deployment |
2019-01-29T14:08:24Z | unassigned | 2019-01-29 PullMirrorsOverdueQueueTooLarge |
2019-01-23T05:05:06Z | unassigned | When 1 server in a fleet of many goes down, multiple alerts fire |
2019-01-21T22:54:15Z | unassigned | High4xxRateLimit in Staging |
2018-12-04T18:06:34Z | unassigned | Adjust RUBY_GLOBAL_METHOD_CACHE_SIZE on the web fleet |
2018-11-26T04:18:27Z | alejandro | Migrate GitLab.com OAuth2 credentials to gitlab-production project |
This issue was automatically generated using oncall-robot-assistant
Edited by Dave Smith