Add PodMonitor support for metrics collection during shutdown
What does this MR do?
This is an alternate implementation of !532 (closed). IMO this is cleaner and more future-proof, but the diff (and legacy cruft introduced) is larger.
So I'll leave it up to the maintainers if they want to take on this extra burden.
Why was this MR needed?
GitLab Runner's graceful shutdown can take hours as it waits for all jobs to complete. By default, Kubernetes immediately removes terminating pods from Service endpoints, causing the ServiceMonitor to stop scraping metrics during this entire shutdown period. This leaves us without observability into the graceful shutdown process.
What's the best way to test this MR?
- Deploy the chart with metrics and PodMonitor enabled
- Trigger a pod termination (e.g.,
kubectl delete pod <runner-pod>) - Verify that Prometheus continues to scrape metrics from the terminating pod during the
terminationGracePeriodSecondswindow - Check the Service endpoints to confirm the terminating pod remains in the list