Add Topics and prometheus metrics to periodic queries
In order to enable the completion of the free saas user top initiative we need to add the following data sources to out data warehouse in snowflake
| Source | Data Team Issue | Engineering Issue |
|---|---|---|
| HA proxy | https://gitlab.com/gitlab-data/analytics/-/issues/11584 | gitlab-com/www-gitlab-com#12980 (closed) |
| Container Registry Logs | https://gitlab.com/gitlab-data/analytics/-/issues/11752 | gitlab-com/www-gitlab-com#12990 (closed) |
It has been proposed that we use Thanos as an intermediary to both of these sources since we have a working data pipeline through CI jobs in ops.gitlab.net and a GCS bucket that is successfully extracting data into Snowflake.
This method has been validated for HA Proxy as a live Grafana dashboard already exists using prometheus queries. @rnienaber recommended that we use the same method for the container registry log data, but it is not clear yet if these data even exist in Thanos.
Next Steps needed for HA Proxy Data
- Add topic for the prometheus metrics to the pipeline linked above
👉 gitlab-com/runbooks!4481 (merged)👉 gitlab-com/runbooks!4505 (merged)
Next Steps for Container Registry logs.
- Create prometheus metrics for container registry logs
- Add topic for the prometheus metrics to the pipeline linked above
Final notes
We might still need additional validation and guidance from @lmai1 and others in order to create the prometheus metrics appropriately. We're currently validating this by looking at static data for the registry logs that are in bigQuery