Skip to content

Fix pg_stat_activity sampler

From https://gitlab.slack.com/archives/C3NBYFJ6N/p1761152022249599?thread_ts=1761145399.714439&cid=C3NBYFJ6N

We noticed the marginalia sampler is not parsing the endpoint part correctly.

Its regexp_matches expression expects the correlation_id value to have only letters and numbers (\w), but now it often also contains a dash.Here's the current regexp:

regexp_matches(query, '^\s*(?:\/\*(?:application:(\w+),?)?(?:correlation_id:\w+,?)?(?:jid:\w+,?)?(?:endpoint_id:([\w/\-\.:\#\s]+),?)?(?:db_config_database:(\w+),?)?.*?\*\/)?\s*(\w+)') AS matches,

And here's an example showing its input marginalia string and how the regexp fails to parse correlation_id and all subsequent fields:

gitlabhq_production=# select regexp_match(query, '^\s*(\/\*.*?\*\/)') as marginalia_string, regexp_matches(query, '^\s*(?:\/\*(?:application:(\w+),?)?(?:correlation_id:\w+,?)?(?:jid:\w+,?)?(?:endpoint_id:([\w/\-\.:\#\s]+),?)?(?:db_config_database:(\w+),?)?.*?\*\/)?\s*(\w+)') AS matches from pgsa_sample where query ~ '^\s*\/\*' order by pid limit 1 ; -[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- marginalia_string | {"/*application:sidekiq,correlation_id:992a721e938b0ca1-IAD,jid:1622998202c6917ed077601d,endpoint_id:Security::GeneratePolicyViolationCommentWorker,db_config_database:gitlabhq_production_sidekiq,db_config_name:main*/"} matches           | {sidekiq,NULL,NULL,SELECT}

If we add a - to the list of allowed characters, then parsing works again:

gitlabhq_production=# select regexp_match(query, '^\s*(\/\*.*?\*\/)') as marginalia_string, regexp_matches(query, '^\s*(?:\/\*(?:application:(\w+),?)?(?:correlation_id:[\w\-]+,?)?(?:jid:\w+,?)?(?:endpoint_id:([\w/\-\.:\#\s]+),?)?(?:db_config_database:(\w+),?)?.*?\*\/)?\s*(\w+)') AS matches from pgsa_sample where query ~ '^\s*\/\*' order by pid limit 1 ; -[ RECORD 1 ]-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- marginalia_string | {"/*application:sidekiq,correlation_id:992a721e938b0ca1-IAD,jid:1622998202c6917ed077601d,endpoint_id:Security::GeneratePolicyViolationCommentWorker,db_config_database:gitlabhq_production_sidekiq,db_config_name:main*/"} matches           | {sidekiq,Security::GeneratePolicyViolationCommentWorker,gitlabhq_production_sidekiq,SELECT}

So I guess the breaking change was appending what looks like a three-character cloudflare datacenter id to the correlation_id.

That seems like something that may have been introduced to support Cells, like maybe at the request router layer?  Just speculating.

Specifically, we'll need to fix https://gitlab.com/gitlab-org/gitlab/blob/9e0db2faaf256bf77f70de4d94f0a22081c56c50/lib/gitlab/database/stat_activity_sampler.rb#L27-27 to include - in the correlation_id part of the string.

Without this fixed, the sidekiq circuit breaking feature couldn't attribute the worker holding a connection.

Edited by Marco Gregorius
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information