An error occurred while fetching the assigned iteration of the selected issue.

On-Call Handover 2021-07-24 15:00 UTC

On-Call Handover

Brought to you by the Slack slash command: /sre-oncall handover

EOC egress: @alejandro
EOC ingress: @cmcfarland

Summary:

What (if any) time-critical work is being handed over?

What contextual info may be useful for the next few on-call shifts?

Ongoing alerts/incidents:

production#5218 (closed) - 2020-07-23: about.gitlab.com broken UI due to old CSS 404ing
production#5212 (closed) - 2021-07-22: Service desk emails are not being processed
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5193
production#5187 (closed) - 2021-07-19: The shard_memory_bound SLI of the sidekiq service (main stage) has not received any traffic in the past 30 minutes
production#5169 (closed) - 2021-07-15: The goserver_op_service SLI of the gitaly service on node file-01-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5160 (closed) - 2021-07-14: Alertmanager is failing sending notifications
production#5159 (closed) - 2021-07-14: goserver_op_service SLI of the gitaly service on file-42 has an apdex violating SLO
production#5134 (closed) - 2021-07-09 - CI jobs using alpine 3.14 based images are failing
production#5079 (closed) - 2021-07-05: Apdex dip on Gitaly node file-37 due to flood of UserMergeToRef calls from single project
production#5062 (closed) - 2021-07-01: High disk usage by thanos-store persistent-volume-claim

Resolved actionable alerts:

Unactionable alerts:

Resolved production incidents:

Mitigated production incidents:

Expand for list of Mitigated Incidents

https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5217
production#5213 (closed) - 2021-07-22: Thanos unavailable
production#5206 (closed) - 2021-07-21: The goserver_op_service SLI of the gitaly service on node file-59-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5203 (closed) - 2021-07-21: thanos query front end error rate is high
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5191
production#5186 (closed) - 2021-07-19: The goserver SLI of the gitaly service on node file-04-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5182 (closed) - 2021-07-19: Intermittent QA test failures in staging
production#5173 (closed) - 2021-07-15: Blackbox probes for https://pre.gitlab.com are failing
production#5172 (closed) - 2021-07-15: The goserver_op_service SLI of the gitaly service on node file-27-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5168 (closed) - 2021-07-15: web-pages-{01,02} have empty chef run list
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5165 - 2021-07-15: The nginx_ingress SLI of the api service in region us-east1-d has an error rate violating SLO
production#5162 (closed) - 2021-07-14: The goserver SLI of the gitaly service on node file-praefect-02-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5161 (closed) - 2021-07-14: The goserver SLI of the gitaly service on node file-43-stor-gprd.c.gitlab-production.internal has an apdex violating SLO
production#5158 (closed) - 2021-07-13: The rails_redis_client SLI of the redis-sidekiq service (main stage) has an apdex violating SLO
production#5157 (closed) - 2021-07-13: Alert Manager webhook integration failing
production#5155 (closed) - 2021-07-13: Blackbox probe failures for docs.gitlab.com and next.gitlab.com
production#5154 (closed) - 2021-07-13: Multiple thanos query front-end errors
production#5152 (closed) - 2021-07-13: Increase in errors across multiple GitLab.com services
production#5149 (closed) - 2021-07-13: registry service has an error rate violating SLO
production#5117 (closed) - 2021-07-07: Increased API error rate on nginx-ingress in us-east1-d

Change issues:

In Progress

Closed

Designs

An error occurred while loading designs. Please try again.

Child items 0

GraphQL error: The resource that you are attempting to access does not exist or you don't have permission to perform this action

No child items are currently open.