Use `InOperatorOptimization` for group-level vulnerabilities
What does this MR do and why?
Background Context
We have an issue where the report export for all vulnerabilities is timing out.
The existing export flow is:
- A
Vulnerability::Export
record is created. - A
VulnerabilityExports::ExportWorker
job is started to generate the exported file. - The exporter runs using the query generated by
Security::VulnerabilitiesFinder(group).execute.with_findings_scanner_identifiers_and_notes
and usesCsvBuilder
to make the CSV.
The query should run in batches of 1000 via find_each
, so we were
surprised it was creating PG timeouts
exporting the gitlab-org
group (~40,000 vulnerabilities) seems to
cause the timeout. The original reporter of the bug has around ~320k
vulnerabilities
Proposed Fix
There is a nested IN
query to find all of the Project
IDs under a
give Group
(and subgroups). After we have the IDs, the query should
be fairly straightforward as we have a project_id
column on the
vulnerabilities
table.
We want to try using the InOperatorOptimization
module to
improve that portion of the query.
In this commit we add a code-path using that module, gated by a FF (disabled by default).
We also need to add an index before attempting to use the module. In this commit we schedule an async index addition, and will need to follow-up with a synchronous index migration before toggling the FF (#426371 (closed))
SQL
Query obtained by running the following in a terminal and performing a report export via the web UI
tail -f log/*.log | grep -C20 --line-buffered 'FROM "vulnerabilities"' | tee vulns.txt
original
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/22700/commands/73283
Time: 1.195 min
- planning: 451.309 ms
- execution: 1.188 min
- I/O read: 1.111 min
- I/O write: 1.891 ms
Shared buffers:
- hits: 101495 (~792.90 MiB) from the buffer pool
- reads: 429471 (~3.30 GiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 35 (~280.00 KiB)
SELECT
"vulnerabilities".*
FROM
"vulnerabilities"
INNER JOIN "projects" ON "projects"."id" = "vulnerabilities"."project_id"
WHERE
"vulnerabilities"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{9970}')))
AND "projects"."pending_delete" = FALSE)
AND "vulnerabilities"."present_on_default_branch" = TRUE
AND "projects"."archived" = FALSE
ORDER BY
"vulnerabilities"."id" ASC
LIMIT 1000
new query
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22700/commands/73285
Time: 8.686 s
- planning: 577.968 ms
- execution: 8.108 s
- I/O read: 7.905 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 11832 (~92.40 MiB) from the buffer pool
- reads: 6733 (~52.60 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
SELECT
*
FROM ( WITH RECURSIVE "array_cte" AS MATERIALIZED (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(
namespaces.traversal_ids, 1
)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (
traversal_ids @> (
'{9970}'
)
)
)
AND "projects"."archived" = FALSE
),
"recursive_keyset_cte" AS (
(
SELECT
NULL::bigint AS id,
array_cte_id_array,
vulnerabilities_id_array,
0::bigint AS count
FROM (
SELECT
ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
ARRAY_AGG("vulnerabilities"."id") AS vulnerabilities_id_array
FROM (
SELECT
"array_cte"."id"
FROM
array_cte) array_cte
LEFT JOIN LATERAL (
SELECT
"vulnerabilities"."id" AS id
FROM
"vulnerabilities"
WHERE
"vulnerabilities"."project_id" = "array_cte"."id"
AND ("vulnerabilities"."id" < 31)
ORDER BY
"vulnerabilities"."id" DESC
LIMIT 1) vulnerabilities ON TRUE
WHERE
"vulnerabilities"."id" IS NOT NULL) array_scope_lateral_query
LIMIT 1)
UNION ALL (
SELECT
recursive_keyset_cte.vulnerabilities_id_array[position],
array_cte_id_array,
recursive_keyset_cte.vulnerabilities_id_array[:position_query.position - 1] || next_cursor_values.id || recursive_keyset_cte.vulnerabilities_id_array[position_query.position + 1:],
recursive_keyset_cte.count + 1
FROM
recursive_keyset_cte,
LATERAL (
SELECT
id,
position
FROM
UNNEST(vulnerabilities_id_array)
WITH ORDINALITY AS u (id, position)
WHERE
id IS NOT NULL
ORDER BY
1 DESC
LIMIT 1) AS position_query,
LATERAL (
SELECT
"record"."id"
FROM (
VALUES (NULL)) AS nulls
LEFT JOIN (
SELECT
"vulnerabilities"."id" AS id
FROM
"vulnerabilities"
WHERE
"vulnerabilities"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
AND ("vulnerabilities"."id" < recursive_keyset_cte.vulnerabilities_id_array[position])
ORDER BY
"vulnerabilities"."id" DESC
LIMIT 1) record ON TRUE
LIMIT 1) AS next_cursor_values))
SELECT
id
FROM
"recursive_keyset_cte" AS "vulnerabilities"
WHERE (count <> 0)) vulnerabilities
LIMIT 1000
Generated Report
Before | After |
---|---|
gitlab-org_vulnerabilities_2023-09-21T0806.csv | gitlab-org_vulnerabilities_2023-09-27T0733.csv |
How to set up and validate locally
- verify the migration following this guide
- download the group vulnerability report (example: http://gdk.test:3000/groups/gitlab-org/-/security/vulnerabilities)
- you can import this project into a group and run a pipeline to seed vulnerabilities
- enable the feature flag
Feature.enable(:group_vulnerabilities_optimized_query)
- download the report again, compare the reports
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #392885 (closed)