Reduce database connection pool metric cardinality
Reduce database connection pool metric cardinality
Partially revert
!223981 (merged)
which added per-thread-name labels to
gitlab_database_connection_pool_busy and
gitlab_database_connection_pool_dead metrics. This caused a
cardinality explosion making these metrics impossible to query
and contributing to Mimir ingester OOMs.
Changes:
- Remove per-thread splitting of
busyanddeadfrom the default metrics, reverting to scalar values fromconnection_pool.stat - Add
multiprocess_modeto allDatabaseSamplergauges so metrics are aggregated across Puma worker processes (minforsize,maxfor all others) - Add optional per-thread metrics under separate gauge names
gitlab_database_extended_connection_pool_{busy,dead}gated behind theper_thread_db_connection_pool_metricsops feature flag scoped toFeature.current_pod, allowing operators to enable detailed metrics for a percentage of pods via chatops
Important caveat of the idle metric: this counts connections that
have already been initialized, but aren't in use. This means that
busy + dead + idle <= connections. For saturation monitoring we
need to use dead + busy.
The default metrics reduce cardinality from
pods * processes * threads * db_hosts to pods * db_hosts.
Edited by Bob Van Landuyt