Skip to content

Use `InOperatorOptimization` for group-level vulnerabilities

What does this MR do and why?

Background Context

We have an issue where the report export for all vulnerabilities is timing out.

The existing export flow is:

  1. A Vulnerability::Export record is created.
  2. A VulnerabilityExports::ExportWorker job is started to generate the exported file.
  3. The exporter runs using the query generated by Security::VulnerabilitiesFinder(group).execute.with_findings_scanner_identifiers_and_notes and uses CsvBuilder to make the CSV.

The query should run in batches of 1000 via find_each, so we were surprised it was creating PG timeouts

exporting the gitlab-org group (~40,000 vulnerabilities) seems to cause the timeout. The original reporter of the bug has around ~320k vulnerabilities

Proposed Fix

There is a nested IN query to find all of the Project IDs under a give Group (and subgroups). After we have the IDs, the query should be fairly straightforward as we have a project_id column on the vulnerabilities table.

We want to try using the InOperatorOptimization module to improve that portion of the query.

In this commit we add a code-path using that module, gated by a FF (disabled by default).

We also need to add an index before attempting to use the module. In this commit we schedule an async index addition, and will need to follow-up with a synchronous index migration before toggling the FF (#426371 (closed))

SQL

Query obtained by running the following in a terminal and performing a report export via the web UI

 tail -f log/*.log | grep -C20 --line-buffered 'FROM "vulnerabilities"' | tee vulns.txt
original

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/22700/commands/73283

Time: 1.195 min  
  - planning: 451.309 ms  
  - execution: 1.188 min  
    - I/O read: 1.111 min  
    - I/O write: 1.891 ms  
  
Shared buffers:  
  - hits: 101495 (~792.90 MiB) from the buffer pool  
  - reads: 429471 (~3.30 GiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 35 (~280.00 KiB)  
SELECT
    "vulnerabilities".*
FROM
    "vulnerabilities"
    INNER JOIN "projects" ON "projects"."id" = "vulnerabilities"."project_id"
WHERE
    "vulnerabilities"."project_id" IN (
        SELECT
            "projects"."id"
        FROM
            "projects"
        WHERE
            "projects"."namespace_id" IN (
                SELECT
                    "namespaces"."id"
                FROM
                    "namespaces"
                WHERE
                    "namespaces"."type" = 'Group'
                    AND (traversal_ids @> ('{9970}')))
                AND "projects"."pending_delete" = FALSE)
        AND "vulnerabilities"."present_on_default_branch" = TRUE
        AND "projects"."archived" = FALSE
    ORDER BY
        "vulnerabilities"."id" ASC
    LIMIT 1000
new query

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22700/commands/73285

Time: 8.686 s  
  - planning: 577.968 ms  
  - execution: 8.108 s  
    - I/O read: 7.905 s  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 11832 (~92.40 MiB) from the buffer pool  
  - reads: 6733 (~52.60 MiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0  
SELECT
    *
FROM ( WITH RECURSIVE "array_cte" AS MATERIALIZED (
        SELECT
            "projects"."id"
        FROM
            "projects"
        WHERE
            "projects"."namespace_id" IN (
                SELECT
                    namespaces.traversal_ids[array_length(
                        namespaces.traversal_ids, 1
)] AS id
                FROM
                    "namespaces"
                WHERE
                    "namespaces"."type" = 'Group'
                    AND (
                        traversal_ids @> (
                            '{9970}'
)
)
)
                AND "projects"."archived" = FALSE
),
            "recursive_keyset_cte" AS (
(
                    SELECT
                        NULL::bigint AS id,
                        array_cte_id_array,
                        vulnerabilities_id_array,
                        0::bigint AS count
                    FROM (
                        SELECT
                            ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
                        ARRAY_AGG("vulnerabilities"."id") AS vulnerabilities_id_array
                    FROM (
                        SELECT
                            "array_cte"."id"
                        FROM
                            array_cte) array_cte
                    LEFT JOIN LATERAL (
                        SELECT
                            "vulnerabilities"."id" AS id
                        FROM
                            "vulnerabilities"
                        WHERE
                            "vulnerabilities"."project_id" = "array_cte"."id"
                            AND ("vulnerabilities"."id" < 31)
                        ORDER BY
                            "vulnerabilities"."id" DESC
                        LIMIT 1) vulnerabilities ON TRUE
                WHERE
                    "vulnerabilities"."id" IS NOT NULL) array_scope_lateral_query
            LIMIT 1)
    UNION ALL (
        SELECT
            recursive_keyset_cte.vulnerabilities_id_array[position],
            array_cte_id_array,
            recursive_keyset_cte.vulnerabilities_id_array[:position_query.position - 1] || next_cursor_values.id || recursive_keyset_cte.vulnerabilities_id_array[position_query.position + 1:],
            recursive_keyset_cte.count + 1
        FROM
            recursive_keyset_cte,
            LATERAL (
                SELECT
                    id,
                    position
                FROM
                    UNNEST(vulnerabilities_id_array)
                    WITH ORDINALITY AS u (id, position)
                WHERE
                    id IS NOT NULL
                ORDER BY
                    1 DESC
                LIMIT 1) AS position_query,
        LATERAL (
            SELECT
                "record"."id"
            FROM (
                VALUES (NULL)) AS nulls
        LEFT JOIN (
            SELECT
                "vulnerabilities"."id" AS id
            FROM
                "vulnerabilities"
            WHERE
                "vulnerabilities"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
                    AND ("vulnerabilities"."id" < recursive_keyset_cte.vulnerabilities_id_array[position])
                ORDER BY
                    "vulnerabilities"."id" DESC
                LIMIT 1) record ON TRUE
        LIMIT 1) AS next_cursor_values))
SELECT
    id
FROM
    "recursive_keyset_cte" AS "vulnerabilities"
WHERE (count <> 0)) vulnerabilities
LIMIT 1000

Generated Report

Before After
gitlab-org_vulnerabilities_2023-09-21T0806.csv gitlab-org_vulnerabilities_2023-09-27T0733.csv

How to set up and validate locally

  1. verify the migration following this guide
  2. download the group vulnerability report (example: http://gdk.test:3000/groups/gitlab-org/-/security/vulnerabilities)
  • you can import this project into a group and run a pipeline to seed vulnerabilities
  1. enable the feature flag
Feature.enable(:group_vulnerabilities_optimized_query)
  1. download the report again, compare the reports

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #392885 (closed)

Edited by Michael Becker

Merge request reports