Investigate and remove unused Rails metrics to reduce cardinality
## Background As part of cardinality reduction efforts tracked in gitlab-com/gl-infra/capacity-planning-trackers/gitlab-com#2664, we've identified several Rails metrics that may not be used in SLIs but are contributing to high cardinality. ## Problem The GitLab Rails monolith exports several metrics that may be candidates for removal or optimization: 1. **`http_requests_total` over-initialization**: We currently initialize this metric with [specific status codes](https://gitlab.com/gitlab-org/gitlab/-/blob/66861ba1f50f463ee31f0dab0195656318d42a34/lib/gitlab/metrics/requests_rack_middleware.rb#L11-18) when we should be using status code classes instead. 2. **Potentially unused metrics**: The following metrics are not confirmed to be in use by SLIs and may be safe to remove: | metricname | value | |---|---| | gitlab_transaction_duration_seconds_bucket{} | 4550 | | gitlab_sql_replica_duration_seconds_bucket{} | 4160 | | gitlab_cache_operations_total{} | 3322 | | gitlab_transaction_cache_read_hit_count_total{} | 1756 | | gitlab_transaction_db_count_total{} | 1319 | | gitlab_sql_duration_seconds_count{} | 1319 | | gitlab_sql_duration_seconds_sum{} | 1319 | | gitlab_sql_primary_duration_seconds_bucket{} | 1116 | | gitlab_transaction_db_replica_count_total{} | 1040 | | gitlab_transaction_cache_read_miss_count_total{} | 788 | | gitlab_database_transaction_seconds_bucket{} | 768 | | gitlab_transaction_db_cached_count_total{} | 704 | | gitlab_external_http_duration_seconds_bucket{} | 679 | | http_requests_total{} | 673 | | gitlab_transaction_duration_seconds_sum{} | 650 | | gitlab_workhorse_http_request_duration_seconds_bucket{} | 650 | | gitlab_transaction_duration_seconds_count{} | 650 | | gitlab_workhorse_http_request_size_bytes_bucket{} | 585 | | gitlab_transaction_db_replica_cached_count_total{} | 584 | | gitlab_sli_rails_request_apdex_total{} | 571 | | gitlab_sli_rails_request_total{} | 571 | [Source query](https://dashboards.gitlab.net/explore?schemaVersion=1&panes={%2287r%22:{%22datasource%22:%22mimir-gitlab-gprd%22,%22queries%22:[{%22refId%22:%22A%22,%22expr%22:%22sort_desc(count%20by%20(__name__)({pod%3D\\%22gitlab-webservice-web-96fdfc96-2vc57\\%22}))%22,%22range%22:false,%22instant%22:true,%22datasource%22:{%22type%22:%22prometheus%22,%22uid%22:%22mimir-gitlab-gprd%22},%22editorMode%22:%22code%22,%22legendFormat%22:%22__auto%22,%22format%22:%22table%22,%22exemplar%22:false}],%22range%22:{%22from%22:%221770646105702%22,%22to%22:%221770646329534%22},%22compact%22:false}}&orgId=1) ## Action Items - [ ] Audit each metric listed above to determine if it's used in SLIs or critical dashboards - [x] Change `http_requests_total` initialization to use status code classes instead of specific status codes - [ ] Remove metrics that are confirmed to be unused - [ ] Document any metrics that should be kept and why ## Related - gitlab-com/gl-infra/capacity-planning-trackers/gitlab-com#2664
issue