Skip to content

Apply IN Optimization to SQL query

mo khan requested to merge mokhax/457231/in-optimization into master

What does this MR do and why?

This change optimizes the query used to load the results for the Explore > Dependency list page and is gated behind the explore_dependencies feature flag. This optimization uses the InOperatorOptimization module. A changelog entry has not been added because this code is still gated behind a feature flag.

#457231 (closed)

Before:

SELECT "sbom_occurrences".*
FROM "sbom_occurrences"
INNER JOIN "projects" ON "sbom_occurrences"."project_id" = "projects"."id"
INNER JOIN "projects" "projects_sbom_occurrences" ON "projects_sbom_occurrences"."id" = "sbom_occurrences"."project_id"
INNER JOIN "project_authorizations" ON "project_authorizations"."project_id" = "projects_sbom_occurrences"."id"
WHERE "projects"."organization_id" = 1
AND "projects"."archived" = FALSE
AND "project_authorizations"."user_id" = 971158
ORDER BY "sbom_occurrences"."id" ASC
LIMIT 20;
Time: 11.444 min
  - planning: 10.081 ms
  - execution: 11.444 min
    - I/O read: 26.661 min
    - I/O write: 534.192 ms

Shared buffers:
  - hits: 134243819 (~1.00 TiB) from the buffer pool
  - reads: 23962537 (~182.80 GiB) from the OS file cache, including disk I/O
  - dirtied: 112182 (~876.40 MiB)
  - writes: 21834 (~170.60 MiB)

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/27662/commands/86285

After:

SELECT
  *
FROM ( WITH RECURSIVE "array_cte" AS MATERIALIZED (
    SELECT
      "project_authorizations"."project_id"
    FROM
      "project_authorizations"
    WHERE
      "project_authorizations"."user_id" = 971158
),
    "recursive_keyset_cte" AS (
(
        SELECT
          NULL::sbom_occurrences AS records,
          array_cte_project_id_array,
          sbom_occurrences_id_array,
          0::bigint AS count
        FROM (
          SELECT
            ARRAY_AGG("array_cte"."project_id") AS array_cte_project_id_array,
            ARRAY_AGG("sbom_occurrences"."id") AS sbom_occurrences_id_array
        FROM (
          SELECT
            "array_cte"."project_id"
          FROM
            array_cte) array_cte
        LEFT JOIN LATERAL (
          SELECT
            "sbom_occurrences"."id" AS id
          FROM
            "sbom_occurrences"
            INNER JOIN "projects" ON "sbom_occurrences"."project_id" = "projects"."id"
          WHERE
            "projects"."organization_id" = 1
            AND "projects"."archived" = FALSE
            AND "sbom_occurrences"."project_id" = "array_cte"."project_id"
          ORDER BY
            "sbom_occurrences"."id" ASC
          LIMIT 1) sbom_occurrences ON TRUE
      WHERE
        "sbom_occurrences"."id" IS NOT NULL) array_scope_lateral_query
    LIMIT 1)
UNION ALL (
  SELECT
    (
      SELECT
        sbom_occurrences
      FROM
        "sbom_occurrences"
      WHERE
        "sbom_occurrences"."id" = recursive_keyset_cte.sbom_occurrences_id_array[position]
      LIMIT 1),
    array_cte_project_id_array,
    recursive_keyset_cte.sbom_occurrences_id_array[:position_query.position - 1] || next_cursor_values.id || recursive_keyset_cte.sbom_occurrences_id_array[position_query.position + 1:],
    recursive_keyset_cte.count + 1
  FROM
    recursive_keyset_cte,
    LATERAL (
      SELECT
        id,
        position
      FROM
        UNNEST(sbom_occurrences_id_array)
        WITH ORDINALITY AS u (id, position)
      WHERE
        id IS NOT NULL
      ORDER BY
        1 ASC
      LIMIT 1) AS position_query,
  LATERAL (
    SELECT
      "record"."id"
    FROM (
      VALUES (NULL)) AS nulls
    LEFT JOIN (
      SELECT
        "sbom_occurrences"."id" AS id
      FROM
        "sbom_occurrences"
        INNER JOIN "projects" ON "sbom_occurrences"."project_id" = "projects"."id"
      WHERE
        "projects"."organization_id" = 1
        AND "projects"."archived" = FALSE
        AND "sbom_occurrences"."project_id" = recursive_keyset_cte.array_cte_project_id_array[position]
        AND ("sbom_occurrences"."id" > recursive_keyset_cte.sbom_occurrences_id_array[position])
      ORDER BY
        "sbom_occurrences"."id" ASC
      LIMIT 1) record ON TRUE
  LIMIT 1) AS next_cursor_values))
SELECT
  (records).*
FROM
  "recursive_keyset_cte" AS "sbom_occurrences"
WHERE (count <> 0)) sbom_occurrences
LIMIT 21;
Time: 469.684 ms
  - planning: 330.983 ms
  - execution: 138.701 ms
    - I/O read: 134.775 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 375 (~2.90 MiB) from the buffer pool
  - reads: 105 (~840.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/27662/commands/86312

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Before After
before after

How to set up and validate locally

  1. In rails console enable the experiment fully
    Feature.enable(:explore_dependencies)
  2. Visit http://127.0.0.1:3000/explore/dependencies
Edited by mo khan

Merge request reports