As of 2024-05-14 a change has been implemented to enforce access token expiry. If you see higher numbers of HTTP 401 responses, please check tokens that may have expired. For more details, including how to mitigate disruption, please check our docs guide on Expired Access Tokens. If you have a support contract, you may open a support ticket for additional assistance.

2018-12-06 Delays in shared runners on GitLab.com

Summary

A brief summary of what happened. Try to make it as executive-friendly as possible.

Service(s) affected : CI Shared runners on GitLab.com Team attribution : Minutes downtime or degradation :

Outage time: 2018-12-05 14:40UTC to 2018-12-05 19:40 UTC - 5 hours interrupt where pending jobs was abnormally high per https://dashboards.gitlab.net/d/000000159/ci?panelId=2&fullscreen&orgId=1&from=1544008236203&to=1544040636000&var-runner_type=All&var-runner_managers=All&var-cache_server=All&var-gl_monitor_fqdn=postgres-02-db-gprd.c.gitlab-production.internal&var-has_minutes=yes&var-hanging_droplets_cleaner=All&var-droplet_zero_machines_cleaner=All&var-runner_job_failure_reason=All&var-gitlab_env=gprd&var-jobs_running_for_project=0

Timeline

2018-12-06

Notes from Slack:

16:25 UTC It looks like we are at 100% of quota on SSD disks for CI runners in GCP
16:30 UTC starting to remove stale SSDs for unattached disks
17:00 UTC - Creation of incident issue
17:15 UTC - looking into enabling some DO runners to give us some capacity
17:25 UTC - DO runners enabled and we should start picking up jobs
17:47 UTC - Shared DO runners are helping bring the number of pending jobs down to better levels - continuing to monitor.

Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖