Add PodMonitor support for metrics collection during shutdown (!533) · Merge requests · GitLab.org / charts / GitLab Runner

What does this MR do?

This is an alternate implementation of !532 (closed). IMO this is cleaner and more future-proof, but the diff (and legacy cruft introduced) is larger.

So I'll leave it up to the maintainers if they want to take on this extra burden.

Why was this MR needed?

GitLab Runner's graceful shutdown can take hours as it waits for all jobs to complete. By default, Kubernetes immediately removes terminating pods from Service endpoints, causing the ServiceMonitor to stop scraping metrics during this entire shutdown period. This leaves us without observability into the graceful shutdown process.

What's the best way to test this MR?

Deploy the chart with metrics and PodMonitor enabled
Trigger a pod termination (e.g., kubectl delete pod <runner-pod>)
Verify that Prometheus continues to scrape metrics from the terminating pod during the terminationGracePeriodSeconds window
Check the Service endpoints to confirm the terminating pod remains in the list

Add PodMonitor support for metrics collection during shutdown

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports