Automatically collect CI Runner metrics
GitLab Runner has been instrumented with Prometheus metrics, so we should try to make the scraping of those as easy as possible. Right now out of the box, you have to handle this yourself which is not trivial. In particular this is because the bundled Prometheus server's config is managed by Omnibus.
Since we own both of these products, we should make the monitoring of Runners totally automatic.
There are a few potential ways to try to solve this:
Another option which may gracefully handle the potentially ephemeral nature of the Runners is to utilize the pushgateway. This way a Runner could push these metrics up to GitLab, re-using much of the existing network connectivity requirements that already exist anyway. (SSL for git repos, etc.)
This is probably the best solution with the least downsides. We'd need to ensure this endpoint couldn't be abused, however.
Dynamic scraping from GitLab source server
One potential option when the bundled Prometheus server is enabled, is to track the source IP of a Runner when it connects to GitLab. We can then associate that IP with the Runner, and add it as a scrape target for as long as the Runner is "active".
We may also need to provide an option within the GitLab Runner config to override the automatic IP detection, and instead supply a manual IP. This may help in scenario where the Runner is behind a NAT.
Finally to reduce unnecessary web requests, it may make sense to generate a test query to /metrics, to determine if that URL is reachable with the IP provided before being added to Prometheus.
This has undesirable network requirements however, with the GitLab server needing to hit the Runners.
Since we package Consul with GitLab EE, and are considering using it for Service Discovery for HA deployments, we could also consider registering the Runners there. This has undesirable network requirements however, and you may lost final bits of information as the Runners end their jobs and disappear.
Links / references
GitLab Runners are a fundamental part of GitLab, acting as the engine of the CI/CD system. These Runners can be responsible for a significant amount of compute and bandwidth resources, especially when a company has fully embraced CI/CD and leveraging features like Review Apps. \
With GitLab Runners such an important part of GitLab itself, it is important to monitor them. With GitLab 9.x, the bundled Prometheus server will not automatically collect Runner metrics providing a more complete picture of GitLab server performance.