Add Topics and prometheus metrics to periodic queries

In order to enable the completion of the free saas user top initiative we need to add the following data sources to out data warehouse in snowflake

Source Data Team Issue Engineering Issue
HA proxy https://gitlab.com/gitlab-data/analytics/-/issues/11584 gitlab-com/www-gitlab-com#12980 (closed)
Container Registry Logs https://gitlab.com/gitlab-data/analytics/-/issues/11752 gitlab-com/www-gitlab-com#12990 (closed)

It has been proposed that we use Thanos as an intermediary to both of these sources since we have a working data pipeline through CI jobs in ops.gitlab.net and a GCS bucket that is successfully extracting data into Snowflake.

This method has been validated for HA Proxy as a live Grafana dashboard already exists using prometheus queries. @rnienaber recommended that we use the same method for the container registry log data, but it is not clear yet if these data even exist in Thanos.

Next Steps needed for HA Proxy Data

  1. Add topic for the prometheus metrics to the pipeline linked above 👉 gitlab-com/runbooks!4481 (merged) 👉 gitlab-com/runbooks!4505 (merged)

Next Steps for Container Registry logs.

  1. Create prometheus metrics for container registry logs
  2. Add topic for the prometheus metrics to the pipeline linked above

Final notes

We might still need additional validation and guidance from @lmai1 and others in order to create the prometheus metrics appropriately. We're currently validating this by looking at static data for the registry logs that are in bigQuery

Edited by Rehab