Instrumenting the Big Pipeline Graph
What is this?
As a part of changes coming to the pipeline page (example: https://gitlab.com/gitlab-org/gitlab/-/pipelines/214119613), we're looking to instrument the graph so we can essentially have a dashboard to monitor whether changes have broken a portion of the pipeline graph before they start affecting users.
This issue is a place to figure out what that might look like.
Current proposal
Method
We will use Prometheus to collect the data. Ideally we will piggyback as many frontend calls as possible onto current API calls as possible, but there may be cause to add a few new endpoints. (This architecture will be determined in conjunction with the backend DRI.)
Using Prometheus leverage our current real-time monitoring functionality and the expertise the organization already has.
Another option would be to use OpenTelemetry, but I think this is the most boring solution.
Metrics
Since we have plans to use Sentry to capture frontend errors and visual testing to capture UI degradations, I would like to focus this instrumentation on creating an endpoint on the backend to which we can send performance metrics for calculating and displaying links.
Updated January 18