OnCall report for period: 2019-10-01 - 2019-10-08
Oncall during this period
Schedule | Username |
---|---|
SRE | Ahmad Sherif |
SRE | Craig Barrett |
SRE | Michal Wasilewski |
PagerDuty Incidents
* Number of incidents: **28**
Show/Hide Table
Created | Summary |
---|---|
2019-10-01T10:05:28Z | [15215] Firing 1 - Large number of overdue pull mirror jobs: 7269.5 |
2019-10-01T11:08:57Z | [15216] Firing 1 - Postgres is generating XLOG too fast, expect this to cause replication lag |
2019-10-01T13:33:38Z | [15217] Firing 1 - Large number of overdue pull mirror jobs: 16899 |
2019-10-01T14:37:16Z | [15218] Firing 1 - |
2019-10-01T15:00:22Z | [15220] Firing 1 - Large number of overdue pull mirror jobs: 17030 |
2019-10-01T17:43:10Z | [15221] Pingdom check check:https://forum.gitlab.com/srv/status is down |
2019-10-01T23:04:48Z | [15231] Firing 2 - PostgreSQL_ServiceDown |
2019-10-02T09:18:22Z | [15232] Firing 1 - Large number of overdue pull mirror jobs: 9134.5 |
2019-10-02T15:53:06Z | [15233] Firing 1 - Large number of overdue pull mirror jobs: 12223.5 |
2019-10-02T23:09:12Z | [15234] Firing 1 - PostgreSQL dead tuples is too large |
2019-10-02T23:12:27Z | [15235] Firing 11 - PostgreSQL_ReplicaStaleXmin |
2019-10-03T11:20:42Z | [15236] Firing 1 - WAL-E replication has stopped |
2019-10-03T11:24:48Z | [15237] Firing 1 - Last WALE backup was seen 20m 0s ago. |
2019-10-04T10:02:59Z | [15241] Firing 1 - Chef client failures have reached critical levels |
2019-10-04T11:27:51Z | [15242] Firing 2 - IncreasedBackendConnectionErrors |
2019-10-04T14:06:13Z | [15243] Firing 1 - Gitaly is down on file-13-stor-gprd.c.gitlab-production.internal |
2019-10-04T14:19:52Z | [15244] Firing 1 - High Rails Error Rate on Front End |
2019-10-04T15:53:55Z | [15245] Firing 1 - Gitaly is down on file-41-stor-gprd.c.gitlab-production.internal |
2019-10-05T03:01:24Z | [15250] Pingdom check check:https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/ is down |
2019-10-07T03:26:11Z | [15251] Firing 1 - 1% disk space left |
2019-10-07T12:58:44Z | [15252] Firing 1 - customers.gitlab.com is down for 2 minutes |
2019-10-07T12:58:45Z | [15253] Firing 1 - customers.gitlab.com is not responding correctly for 2 minutes |
2019-10-08T00:36:34Z | [15257] Firing 1 - Chef client failures have reached critical levels |
2019-10-08T04:05:55Z | [15258] Firing 1 - The console-node service is less available than normal |
2019-10-08T04:12:57Z | [15259] Firing 1 - Postgres seems to be consuming XLOG very slowly |
2019-10-08T04:16:41Z | [15260] Firing 1 - Postgres Replication lag (in bytes) is high |
2019-10-08T04:17:11Z | [15261] Firing 1 - Postgres Replication lag is over 2 minutes |
2019-10-08T05:00:55Z | [15262] Firing 1 - The console-node service is less available than normal |
7 Day Issue Stats
- Oncall issues : 4
- Access Request : 0
- Change Issues : 7
- Incident Issues : 5
- CorrectiveAction Issues : 0
Change Issues
- 2019-10-07T17:05:40Z - Provision additional API nodes - craig
- 2019-10-04T15:14:34Z - Add Praefect to staging - alejandro
- 2019-10-04T12:51:36Z - Provision an extra realtime sidekiq node - unassigned
- 2019-10-04T12:23:25Z - Expand canary to include more paths - jarv
- 2019-10-04T11:11:48Z - Drain and restart haproxy on fe-11 and fe-16 - jarv
- 2019-10-04T02:20:35Z - Disable Elasticsearch search functionality for new S1/P1 mitigation - craig
- 2019-10-02T18:40:35Z - Migrate large projects off file-33-stor-gprd to file-40-stor-gprd - nnelson
Incident Issues
- 2019-10-07T13:21:57Z - customers.gitlab.com unavailable (returning 500 errors) - unassigned | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1224
- 2019-10-04T14:18:54Z - Gitaly down on file-13 - unassigned | ~S2 | ~"Service:Gitaly" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1222
- 2019-10-01T18:07:30Z - forum.gitlab.com down - craig | | |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1215
- 2019-10-01T15:35:07Z - 2019-10-01: Large number of overdue pull mirror - ahmadsherif | ~S3 | ~"Service:Sidekiq" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1214
- 2019-10-01T14:22:14Z - 2019-10-01: Intermittent SMTP errors on Sidekiq besteffort - ahmadsherif | ~S3 | ~"Service:Sidekiq" |
https://gitlab.com/gitlab-com/gl-infra/production/issues/1213
CorrectiveAction Issues
Open Issue Stats
- Oncall issues : 12
- Change issues : 17
- Incident issues : 0
- Access Request : 4
- CorrectiveAction : 69
Open Change Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2019-10-04T15:14:34Z | alejandro | Add Praefect to staging |
2019-10-04T12:51:36Z | unassigned | Provision an extra realtime sidekiq node |
2019-10-02T18:40:35Z | nnelson | Migrate large projects off file-33-stor-gprd to file-40-stor-gprd |
2019-09-27T10:54:08Z | hphilipps | Roll out Cloud NAT to CI shared runners |
2019-09-23T16:47:40Z | glopezfernandez | WIP: Rotate Gitaly Token |
2019-09-18T10:00:20Z | unassigned | Add two more nodes to the realtime fleet |
2019-09-18T01:05:09Z | msmiley | WIP: Increase Patroni's patience when talking with Consul |
2019-09-16T17:10:18Z | cmcfarland | Run chef with new us-central role in ops-us-central |
2019-09-12T08:00:43Z | andrewn | Add more hosts to the pipeline fleet |
2019-09-09T20:43:17Z | ahanselka | Experimental Changes to runner configuration - grow understanding on runner cron issues. |
2019-08-29T09:35:07Z | mwasilewski-gitlab | switch logging to the new ES7 clusters |
2019-08-27T14:44:48Z | asaba | Enable additional reCAPTCHA protection for Credential Stuffing in 12.2.3 |
2019-08-16T12:04:45Z | unassigned | Remove patroni-01 from the failover selection. |
2019-08-14T19:42:03Z | gerardo.herzig | Removal of unused configuration files in patroni nodes |
2019-07-02T20:54:46Z | devin | Migrate to Hashed Storage from legacy project storage |
2019-06-19T06:59:56Z | unassigned | Implement pipeline quotas on GitLab.com |
2019-03-19T17:32:50Z | Finotto | Convert PK/FK from int4 to int8: events.id, push_event_payloads.event_id, and ci_build_trace_sections.id. Stage 1 of 2. |
Open Incident Issues
Show/Hide Table
Created | Assignee | Summary |
---|
Open Oncall Issues
Show/Hide Table
Created | Assignee | Summary |
---|---|---|
2019-10-04T17:44:08Z | unassigned | Customers App - Staging console is not accessible |
2019-10-04T13:35:04Z | unassigned | New environment variable for customers.gitlab.com |
2019-09-17T18:10:06Z | unassigned | Import request (for nq-develop): nq-app-android |
2019-09-14T10:06:43Z | hphilipps | file-15-stor-gprd rebooted |
2019-09-12T12:34:42Z | hphilipps | file-33-stor-gprd rebooted |
2019-09-12T08:04:32Z | ahanselka | Rotate Gitaly authentication tokens after 12.3.2 deploy |
2019-09-11T04:19:29Z | hphilipps | file-35-stor-gprd rebooted |
2019-09-03T07:50:23Z | unassigned | Production Kibana returninig 502s occasionally |
2019-08-30T14:34:47Z | unassigned | Customer Staging - Sentry is not reporting |
2019-06-18T09:15:35Z | unassigned | Create alert for registry latency or memory |
2019-06-13T03:47:53Z | aamarsanaa | Update Query source to Global in Grafana dashboards that are not pulling any metrics |
2017-10-13T00:36:06Z | cmiskell | security - add CAA records to DNS |
This issue was automatically generated using oncall-robot-assistant