Geo: Bring back legacy project repo Prometheus metrics
Problem
During the migration to Geo Self-Service Framework, a number of Prometheus metrics were essentially renamed. This can break dashboards and alerting, so it is a breaking change. We should have used the deprecation process and shifted breaking changes to major releases.
Breaking change | Old metric prefix | New metric prefix | Notes |
---|---|---|---|
15.11 | geo_wikis_* |
geo_project_wiki_repositories_* |
The window to rectify this has past |
16.1 | geo_design_repositories |
geo_design_management_repositories |
Was not mentioned in Prometheus GitLab metrics doc |
16.1 | geo_design_repositories_synced |
geo_design_management_repositories_synced |
Was not mentioned in Prometheus GitLab metrics doc |
16.1 | geo_design_repositories_failed |
geo_design_management_repositories_failed |
Was not mentioned in Prometheus GitLab metrics doc |
16.3 | geo_repositories_checksummed |
geo_project_repositories_checksummed |
|
16.3 | geo_repositories_checksum_failed |
geo_project_repositories_checksum_failed |
|
16.3 | geo_repositories_synced |
geo_project_repositories_synced |
|
16.3 | geo_repositories_failed |
geo_project_repositories_failed |
|
16.3 | geo_repositories_verified |
geo_project_repositories_verified |
|
16.3 | geo_repositories_verification_failed |
geo_project_repositories_verification_failed |
|
16.3 | geo_repositories_checksum_mismatch |
No replacement | |
16.3 | geo_repositories_retrying_verification |
No replacement |
Proposal
- The wiki Git repository metrics were changed before the 16.0, so I don't think there is anything to do there.
- The design Git repository metrics were never advertised in the GitLab Prometheus metrics document, and design repository replication metrics are not a high priority.
- The project Git repository metrics are the most likely metrics to be monitored by customers. We got at least one customer support ticket escalated up to development.
So we should help customers who monitor the project Git repository metrics. Let's bring back the Project Git repo Prometheus metrics immediately to master
and 16.5
(currently the latest release, so we do not need approval to backport bug fixes).
Note that if we want to leave it as-is, we need to get a deprecation exception. It's difficult, for reasons mentioned in that doc.
Implementation guide
-
!135315 (merged) In addition to setting the usual Prometheus metrics, at this line we should: - Detect if a metric is one of the new project ones above
- Set the corresponding "old" gauge
-
!135645 (merged) Backport it to 16.5 -
!135675 (merged) Ensure the bug is mentioned in Version-specific upgrade instructions doc -
!135528 (merged) -
Deprecation post is tracked by #416426 (closed)
Edited by Michael Kozono