2025-07-25 - issues with Service::Unknown
The following issues have ServiceUnknown. Please update with a valid Service label if possible.
-
Create a CR to remove the invalid runner managers in the staging database ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Explore blocking paging endpoints ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Consider upgrading MySQL box ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Add Interactive Challenges to packagecloud to further deter bots ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Improve connection pool metrics by adding redis instance label ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Reduce client connection pool size for Redis Ratelimiting ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Consider relaxing alerting rules for urgent-authorized-projects ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity4, workincident, workflow-infraTriage -
note - we could make the script that uploads images parallel - efficiency improvement (from @jplum). Edits to docker_image_sync.sh SLOMissed, ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity1, workincident, workflow-infraTriage -
packagecloud unicorn logs not correctly fed to kibana ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Review and adjust Apdex thresholds ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Determine cause of redis-cluster-queues-meta apdex dip ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Validate or set reasonable timout for statement execution in the packagecloud MySQL DB ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Add packagecloud SQL database CPU status as a saturation point on the packagecloud graphana dasboard ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Tune shutdown grace period of pagecloud deployemnt ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Include X-Forwarded-For in GCP LB logs - currently we don't know where the requests come from ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity2, workincident, workflow-infraTriage -
Investigate webhook trigger issue ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity4, workincident, workflow-infraTriage -
follow-up: check why capacity planning didn't catch running out of N2D_CPUS quota in gitlab-r-saas-l-m-amd64-org-6 ServiceUnknown, corrective action, groupProduction Engineering, infradev, severity3, workincident, workflow-infraTriage -
Investigate serf: Shutdown without a Leave by Consul ServiceUnknown, corrective action, groupFoundations, infradev, severity3, workincident, workflow-infraTriage -
Review current parallel test execution limits ServiceUnknown, corrective action, devopsProduction Engineering, groupunknown, infradev, severity4, workincident, workflow-infraTriage -
Establish a process for updating ephemeral VM images ServiceUnknown, corrective action, devopsProduction Engineering, groupunknown, infradev, severity3, teamUnknown, workincident, workflow-infraTriage