OnCall report for period: 2019-06-18 - 2019-06-25
Oncall during this period
Schedule | Username |
---|---|
SRE | Alex Hanselka |
SRE | Amar Amarsanaa |
SRE | Henri Philipps |
SRE | Cameron McFarland |
PagerDuty Incidents
- Number of incidents: 35
Created | Summary |
---|---|
2019-06-18T08:03:12Z | [11412] Firing 1 - Redis master link is not up. |
2019-06-18T09:10:43Z | [11413] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-18T09:46:35Z | [11415] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-18T09:57:20Z | [11416] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-18T15:03:21Z | [11418] Firing 1 - Large number of overdue pull mirror jobs: 8632 |
2019-06-19T04:14:21Z | [11422] Firing 1 - Increased Server Response Errors |
2019-06-19T04:24:20Z | [11423] Firing 2 - IncreasedServerResponseErrors |
2019-06-19T12:36:21Z | [11427] Firing 1 - Large number of overdue pull mirror jobs: 8274 |
2019-06-19T13:05:35Z | [11428] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T13:35:34Z | [11429] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T13:46:36Z | [11430] Firing 1 - Large number of overdue pull mirror jobs: 6310.5 |
2019-06-19T16:50:36Z | [11433] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T17:06:20Z | [11434] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T17:41:35Z | [11435] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T18:04:20Z | [11436] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T18:17:52Z | [11437] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T19:35:35Z | [11438] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T20:11:50Z | [11440] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T21:12:35Z | [11441] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-19T23:09:35Z | [11442] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-21T11:40:28Z | [11451] Firing 1 - Redis Switch Master |
2019-06-21T11:41:43Z | [11452] Firing 2 - RedisMasterLinkDown |
2019-06-21T11:41:47Z | [11453] Firing 1 - Connection of Redis replicas to the master is flapping! Look at redis-cache-03-db-gprd.c.gitlab-production.internal:9121 and its replicas. |
2019-06-21T12:06:28Z | [11454] Firing 2 - RedisMasterLinkDown |
2019-06-21T12:12:47Z | [11455] Firing 1 - Connection of Redis replicas to the master is flapping! Look at redis-cache-01-db-gprd.c.gitlab-production.internal:9121 and its replicas. |
2019-06-21T12:32:32Z | [11456] Firing 2 - RedisMasterLinkDown |
2019-06-22T07:33:52Z | [11463] Firing 2 - IncreasedServerConnectionErrors |
2019-06-22T07:33:52Z | [11462] Firing 2 - IncreasedBackendConnectionErrors |
2019-06-22T07:33:53Z | [11464] Firing 2 - IncreasedServerResponseErrors |
2019-06-24T13:54:36Z | [11479] Firing 1 - Large number of overdue pull mirror jobs: 10829 |
2019-06-24T17:20:16Z | [11485] 9 DB rows deleted |
2019-06-24T17:40:49Z | [11486] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers |
2019-06-24T17:47:56Z | [11487] Firing 1 - postgres-dr-delayed-01-db-gprd.c.gitlab-production.internal postgres service appears down |
| | 2019-06-24T18:03:20Z | [11488] Firing 1 - 5xx Error Rate on Docker Registry Load Balancers | | 2019-06-25T00:45:32Z | [11490] Firing 1 - postgres-dr-delayed-01-db-gprd.c.gitlab-production.internal postgres service appears down |
7 Day Issue Stats
- Oncall issues : 3
- Access Request : 0
- Change Issues : 7
- Incident Issues : 4
- CorrectiveAction Issues : 0
Change Issues
- 2019-06-24T15:34:16Z - add web-32 to fleet - hphilipps
- 2019-06-24T02:04:51Z - Apply gitlab-mitigate-sackpanic cookbook to front-facing servers - cmiskell
- 2019-06-21T16:50:59Z - [PROD] Scale out web fleet (web-31) - cmcfarland
- 2019-06-21T15:42:43Z - Update and enable uptycs on canary hosts - hphilipps
- 2019-06-21T12:55:43Z - Tune down idle_in_transaction_session_timeout - unassigned
- 2019-06-20T11:12:47Z - add web-30 to fleet - hphilipps
- 2019-06-19T06:59:56Z - Implement pipeline quotas on GitLab.com - unassigned
Incident Issues
- 2019-06-24T17:19:33Z - 8 projects accidentally deleted from the <code data-sourcepos="73:69-73:78">gitlab-org</code> group - unassigned | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/920
- 2019-06-21T12:37:29Z - redis process down on redis-02-db-gprd - hphilipps | ~S4 | ~"Service:Redis" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/914
- 2019-06-20T18:03:41Z - Many suspicious queries coming from one IP - unassigned | ~S4 | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/913
- 2019-06-20T15:54:06Z - 2019-06-20 Git push operations to the gitlab-org/gitlab-ce are slow - unassigned | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/912
CorrectiveAction Issues
- 2019-06-20T17:13:46Z - Investigate using a virtual IP for Postgres failovers - unassigned
Open Issue Stats
- Oncall issues : 16
- Change issues : 11
- Incident issues : 5
- Access Request : 5
- CorrectiveAction : 76
Open Change Issues
Created | Assignee | Summary |
---|---|---|
2019-06-24T02:04:51Z | cmiskell | Apply gitlab-mitigate-sackpanic cookbook to front-facing servers |
2019-06-21T15:42:43Z | hphilipps | Update and enable uptycs on canary hosts |
2019-06-21T12:55:43Z | unassigned | Tune down idle_in_transaction_session_timeout |
2019-06-19T06:59:56Z | unassigned | Implement pipeline quotas on GitLab.com |
2019-06-07T16:08:53Z | aamarsanaa | add more storage nodes and rebalance existing ones |
2019-06-06T10:58:56Z | unassigned | Enable Puma on GitLab.com |
2019-06-05T12:26:50Z | unassigned | Cleanup unused postgres config files for patroni instances |
2019-04-23T19:56:48Z | nick.thomas | enable elasticsearch integration on gitlab.com on gitlab-org namespace |
2019-04-10T20:44:18Z | dawsmith | GCP Sizing recommendations testing April 2019 |
2019-04-01T14:30:13Z | cshobe | Make PostgreSQL autovacuum settings less aggressive |
2019-03-19T17:32:50Z | Finotto | Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2. |
Open Incident Issues
Created | Assignee | Summary |
---|---|---|
2019-06-13T07:40:39Z | aamarsanaa | 2019-06-13: The web service (main stage) has a apdex score (latency) below SLO |
2019-06-09T05:23:28Z | unassigned | intermittent errors on version.gitlab.com over the weekends |
2019-05-29T23:52:14Z | ahmadsherif | Pages service interruption |
2019-05-27T13:34:10Z | unassigned | 2019-05-27 Metrics unavailable |
2019-05-23T16:24:38Z | unassigned | 2019-05-23 git-over-SSH errors |
Open Oncall Issues
Created | Assignee | Summary |
---|---|---|
2019-06-18T09:15:35Z | unassigned | Create alert for registry latency or memory |
2019-06-13T13:41:47Z | unassigned | Problems with Git garbage collection on gitlab.com |
2019-06-11T21:05:13Z | unassigned | Elevated rates of internal API failing |
2019-06-11T15:01:42Z | unassigned | Update build-runner s3 settings before they lose support in version 12.0 of GitLab runner. |
2019-06-10T11:23:36Z | unassigned | investigate and potentially adjust prediction based alerting rules for filesystems being full soon (follow up on root filesystem being full on influxdb nodes) |
2019-06-05T10:28:40Z | unassigned | RCA for 2019-06-05 increased response times from rails |
2019-04-16T14:14:27Z | unassigned | Unable to deploy https://customers.stg.gitlab.com |
2019-03-12T17:54:13Z | unassigned | Trigger happy alerts are contributing to alert fatigue |
2019-02-19T05:48:37Z | unassigned | Registry Node Didn't Get Drained prior deployment |
2019-01-29T14:08:24Z | unassigned | 2019-01-29 PullMirrorsOverdueQueueTooLarge |
2019-01-25T03:57:42Z | unassigned | Revive https://customers.stg.gitlab.com |
2019-01-23T05:05:06Z | unassigned | When 1 server in a fleet of many goes down, multiple alerts fire |
2019-01-21T22:54:15Z | unassigned | High4xxRateLimit in Staging |
2018-12-27T04:28:28Z | unassigned | prometheus servers in staging are marked as production |
2018-12-04T18:06:34Z | unassigned | Adjust RUBY_GLOBAL_METHOD_CACHE_SIZE on the web fleet |
2018-11-26T04:18:27Z | alejandro | Migrate GitLab.com OAuth2 credentials to gitlab-production project |
This issue was automatically generated using oncall-robot-assistant