Skip to content

Queries with very large query strings

We regularly log PostgreSQL queries on the primary, which never make it trough to ELK due to a maximum size limitation (in fluentd?).

This can be seen here: https://log.gprd.gitlab.net/goto/b136d9cbbaad62787951057b390e40ac

2022-02-14_12-06

I grabbed today's logs from the PG primary and ran an analysis for log lines > 1024 bytes:

postgres=# select endpoint, pg_size_pretty(sum(length)), pg_size_pretty(max(length)) from long group by 1 order by sum(length) desc limit 10;
                                   endpoint                                   | sum            | max per query 
------------------------------------------------------------------------------+----------------+----------------
 endpoint_id:ReactiveCachingWorker                                            | 187 MB         | 10 MB
 endpoint_id:PUT /api/:version/projects/:id/merge_requests/:merge_request_iid | 14 MB          | 447 kB
 endpoint_id:Groups::OmniauthCallbacksController#group_saml                   | 6756 kB        | 26 kB
 endpoint_id:UpdateMergeRequestsWorker                                        | 5580 kB        | 275 kB
 endpoint_id:ProjectsController#transfer                                      | 1873 kB        | 2612 bytes
 endpoint_id:GroupsController#show                                            | 1650 kB        | 7089 bytes
 endpoint_id:POST /api/scim/:version/groups/:group/Users                      | 1605 kB        | 26 kB
 endpoint_id:PUT /api/:version/projects/:id/issues/:issue_iid                 | 1517 kB        | 32 kB
 endpoint_id:Deployments::DropOlderDeploymentsWorker                          | 1216 kB        | 12 kB
 endpoint_id:AuthorizedProjectsWorker                                         | 1068 kB        | 10 kB
(10 rows)

In this analysis, we can see query samples coming from ReactiveCachingWorker with a query string at 10MB in size (this is the textual representation of the query string!). Those look like this pattern:

/*application:sidekiq,correlation_id:01FVTH29D5QA595REH9MDFG0AD,jid:a6a5eba13221986ba7790e19,endpoint_id:ReactiveCachingWorker,db_config_name:main*/ SELECT ""vulnerability_occurrences"".""vulnerability_id"", ""vulnerability_occurrences"".""project_fingerprint"" FROM ""vulnerability_occurrences"" WHERE ""vulnerability_occurrences"".""project_fingerprint"" IN (/* long list of hashes */)

No control over the length of a query string is typically an anti-pattern and we may want to look into breaking these queries up in the application and e.g. apply batching to them, where possible.

A similar issue has been gitlab-org/gitlab#346581 (closed).

Purpose of this issue is to investigate and find/create the proper issues for development teams.