2025-09-24: Image pull failures on SaaS Linux runners exceeding SLO
Image pull failures on SaaS Linux runners exceeding SLO (Severity 2 (High))
Problem: SaaS Linux runners have a 15.06% image pull failure rate, exceeding the allowed SLO. Jobs cannot pull required images, resulting in widespread job failures.
Impact: CI pipeline jobs are failing for both SaaS and self-managed customers. This is blocking deployments and is generating a high volume of customer emergency reports.
Causes: An ongoing outage at Docker Hub is causing authentication errors when pulling images that are not cached in our internal mirror. Attempts to pull common images like Alpine, BusyBox, and Ruby from the mirror also fail because these images are not available there.
Response strategy: We confirmed the root cause is an external Docker Hub outage and verified that switching to our mirror is not possible for common images. We have communicated the issue to customers and posted a status page update.
This ticket was created to track INC-4183, by incident.io