Skip to content

Rewrite group sbom_occurrences query

Brian Williams requested to merge bwill/rewrite-sbom-occurrences-query into master

What does this MR do and why?

Do aggregations for only 25 occurrences at a time for improved performance. This prevents us from needing to scan all sbom_occurrences in the group.

Relates to: #437651 (closed)

💾 Database

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/26864/commands/83792

WITH "namespaces" AS MATERIALIZED (
    SELECT
        "namespaces"."traversal_ids"
    FROM
        "namespaces"
    WHERE
        "namespaces"."type" = 'Group'
        AND (traversal_ids >= ('{9970}'))
        AND (traversal_ids < ('{9971}'))
)
SELECT
    MIN(outer_occurrences.id) AS id,
    "outer_occurrences"."component_id",
    "outer_occurrences"."component_version_id",
    SUM(counts.occurrence_count) AS occurrence_count,
    SUM(counts.project_count) AS project_count
FROM ( SELECT DISTINCT ON ("inner_occurrences"."component_id", "inner_occurrences"."component_version_id")
        "inner_occurrences"."id",
        "inner_occurrences"."created_at",
        "inner_occurrences"."updated_at",
        "inner_occurrences"."component_version_id",
        "inner_occurrences"."project_id",
        "inner_occurrences"."pipeline_id",
        "inner_occurrences"."source_id",
        "inner_occurrences"."commit_sha",
        "inner_occurrences"."component_id",
        "inner_occurrences"."uuid",
        "inner_occurrences"."package_manager",
        "inner_occurrences"."component_name",
        "inner_occurrences"."input_file_path",
        "inner_occurrences"."licenses",
        "inner_occurrences"."highest_severity",
        "inner_occurrences"."vulnerability_count",
        "inner_occurrences"."source_package_id",
        "inner_occurrences"."archived",
        "inner_occurrences"."traversal_ids",
        "inner_occurrences"."ancestors"
    FROM
        namespaces,
        LATERAL ( SELECT DISTINCT ON ("sbom_occurrences"."component_id", "sbom_occurrences"."component_version_id")
                "sbom_occurrences"."id",
                "sbom_occurrences"."created_at",
                "sbom_occurrences"."updated_at",
                "sbom_occurrences"."component_version_id",
                "sbom_occurrences"."project_id",
                "sbom_occurrences"."pipeline_id",
                "sbom_occurrences"."source_id",
                "sbom_occurrences"."commit_sha",
                "sbom_occurrences"."component_id",
                "sbom_occurrences"."uuid",
                "sbom_occurrences"."package_manager",
                "sbom_occurrences"."component_name",
                "sbom_occurrences"."input_file_path",
                "sbom_occurrences"."licenses",
                "sbom_occurrences"."highest_severity",
                "sbom_occurrences"."vulnerability_count",
                "sbom_occurrences"."source_package_id",
                "sbom_occurrences"."archived",
                "sbom_occurrences"."traversal_ids",
                "sbom_occurrences"."ancestors"
            FROM
                "sbom_occurrences"
            WHERE (sbom_occurrences.traversal_ids = namespaces.traversal_ids::bigint[])
        ORDER BY
            "sbom_occurrences"."component_id" ASC,
            "sbom_occurrences"."component_version_id" ASC
        LIMIT 25) inner_occurrences
ORDER BY
    inner_occurrences.component_id ASC,
    inner_occurrences.component_version_id ASC
LIMIT 25) outer_occurrences,
    LATERAL (
        SELECT
            COUNT(project_id) AS occurrence_count,
            COUNT(DISTINCT project_id) project_count
        FROM
            "sbom_occurrences"
        WHERE (traversal_ids >= ('{9970}'))
        AND (traversal_ids < ('{9971}'))
        AND (sbom_occurrences.component_version_id = outer_occurrences.component_version_id)) counts
GROUP BY
    "outer_occurrences"."component_id",
    "outer_occurrences"."component_version_id"
ORDER BY
    MIN(outer_occurrences.component_id) ASC,
    MIN(outer_occurrences.component_version_id) ASC;
 Sort  (cost=127.19..127.26 rows=25 width=104) (actual time=569.300..569.307 rows=25 loops=1)
   Sort Key: (min(sbom_occurrences.component_id)), (min(sbom_occurrences.component_version_id))
   Sort Method: quicksort  Memory: 28kB
   Buffers: shared hit=4146 read=1232
   I/O Timings: read=542.983 write=0.000
   CTE namespaces
     ->  Index Only Scan using index_namespaces_on_traversal_ids_for_groups_btree on public.namespaces namespaces_1  (cost=0.56..3.58 rows=1 width=28) (actual time=13.221..396.816 rows=841 loops=1)
           Index Cond: ((namespaces_1.traversal_ids >= '{9970}'::integer[]) AND (namespaces_1.traversal_ids < '{9971}'::integer[]))
           Heap Fetches: 122
           Buffers: shared hit=517 read=148
           I/O Timings: read=393.504 write=0.000
   ->  Aggregate  (cost=34.88..123.04 rows=25 width=104) (actual time=565.239..569.211 rows=25 loops=1)
         Group Key: sbom_occurrences.component_id, sbom_occurrences.component_version_id
         Buffers: shared hit=4143 read=1232
         I/O Timings: read=542.983 write=0.000
         ->  Nested Loop  (cost=34.88..122.22 rows=25 width=40) (actual time=564.938..569.159 rows=25 loops=1)
               Buffers: shared hit=4143 read=1232
               I/O Timings: read=542.983 write=0.000
               ->  Limit  (cost=31.29..31.47 rows=25 width=319) (actual time=564.560..564.591 rows=25 loops=1)
                     Buffers: shared hit=4024 read=1171
                     I/O Timings: read=539.142 write=0.000
                     ->  Unique  (cost=31.29..31.47 rows=25 width=319) (actual time=564.557..564.583 rows=25 loops=1)
                           Buffers: shared hit=4024 read=1171
                           I/O Timings: read=539.142 write=0.000
                           ->  Sort  (cost=31.29..31.35 rows=25 width=319) (actual time=564.555..564.566 rows=44 loops=1)
                                 Sort Key: sbom_occurrences.component_id, sbom_occurrences.component_version_id
                                 Sort Method: quicksort  Memory: 93kB
                                 Buffers: shared hit=4024 read=1171
                                 I/O Timings: read=539.142 write=0.000
                                 ->  Nested Loop  (cost=0.59..30.71 rows=25 width=319) (actual time=25.832..563.489 rows=878 loops=1)
                                       Buffers: shared hit=4024 read=1171
                                       I/O Timings: read=539.142 write=0.000
                                       ->  CTE Scan on namespaces  (cost=0.00..0.02 rows=1 width=32) (actual time=13.226..397.849 rows=841 loops=1)
                                             Buffers: shared hit=517 read=148
                                             I/O Timings: read=393.504 write=0.000
                                       ->  Limit  (cost=0.59..30.19 rows=25 width=319) (actual time=0.110..0.195 rows=1 loops=841)
                                             Buffers: shared hit=3507 read=1023
                                             I/O Timings: read=145.638 write=0.000
                                             ->  Unique  (cost=0.59..6164.94 rows=5207 width=319) (actual time=0.109..0.195 rows=1 loops=841)
                                                   Buffers: shared hit=3507 read=1023
                                                   I/O Timings: read=145.638 write=0.000
                                                   ->  Index Scan using index_sbom_occurrences_for_aggregations on public.sbom_occurrences  (cost=0.59..6138.88 rows=5212 width=319) (actual time=0.105..0.189 rows=1 loops=841)
                                                         Index Cond: (sbom_occurrences.traversal_ids = (namespaces.traversal_ids)::bigint[])
                                                         Buffers: shared hit=3507 read=1023
                                                         I/O Timings: read=145.638 write=0.000
               ->  Aggregate  (cost=3.59..3.60 rows=1 width=16) (actual time=0.181..0.181 rows=1 loops=25)
                     Buffers: shared hit=119 read=61
                     I/O Timings: read=3.841 write=0.000
                     ->  Index Scan using idx_sbom_occurrences_on_component_version_id_and_traversal_ids on public.sbom_occurrences sbom_occurrences_1  (cost=0.56..3.58 rows=1 width=8) (actual time=0.169..0.175 rows=3 loops=25)
                           Index Cond: ((sbom_occurrences_1.component_version_id = sbom_occurrences.component_version_id) AND (sbom_occurrences_1.traversal_ids >= '{9970}'::bigint[]) AND (sbom_occurrences_1.traversal_ids < '{9971}'::bigint[]))
                           Buffers: shared hit=119 read=61
                           I/O Timings: read=3.841 write=0.000
  
Time: 1.255 s  
  - planning: 683.960 ms  
  - execution: 570.939 ms  
    - I/O read: 542.983 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 4146 (~32.40 MiB) from the buffer pool  
  - reads: 1232 (~9.60 MiB) from the OS file cache, including disk I/O  
  - dirtied: 0  
  - writes: 0

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Brian Williams

Merge request reports