Skip to content

Add dependency list aggregations

What does this MR do and why?

Add Dependencies::AggregationsFinder which implements a more efficient version of the Group#sbom_occurrences query. The new query scans the sbom_components in the namespace until it finds 20 items that are unique by component_id and component_version_id. It then collects aggregations for those 20 items.

The old version of the query first has to collect all the namespace_ids in the hierarchy, read all sbom_occurrences records belonging to those namespaces, aggregate them, and then limit results to 20 items. The new query performs several orders of magnitude faster.

This query does not have the same functional behavior as the old one. Additional sorting and filtering methods need to be added. These will be done in follow-up MRs in order to limit complexity.

See #427783 (comment 1694776295) for more details on how this query works.

Relates to: #437651 (closed)

💾 Database

Old query
WITH our_occurrences AS MATERIALIZED (
    SELECT
        DISTINCT ON ( sbom_occurrences.component_version_id )
        sbom_occurrences.*,
        count( sbom_occurrences.id ) OVER ( PARTITION BY sbom_occurrences.component_version_id ) AS occurrence_count,
        dense_rank() OVER (
            PARTITION BY sbom_occurrences.component_version_id
            ORDER BY project_id
        ) AS project_count
    FROM
        sbom_occurrences
    WHERE
        sbom_occurrences.project_id IN (
            SELECT
                projects.id
            FROM
                projects
            WHERE
                projects.namespace_id IN (
                    SELECT
                        namespaces.traversal_ids[array_length(
                            namespaces.traversal_ids,
                            1
                        )] AS id
                    FROM
                        namespaces
                    WHERE
                        namespaces.type = 'Group' AND
                        traversal_ids @> '{9970}'
                ) AND
                projects.marked_for_deletion_at IS NULL AND
                projects.pending_delete = false
        ) AND
        sbom_occurrences.component_version_id IS NOT NULL
    ORDER BY
        sbom_occurrences.component_version_id DESC,
        project_count DESC,
        sbom_occurrences.id DESC
)
SELECT
    id,
    component_id,
    component_version_id,
    occurrence_count,
    vulnerability_count,
    project_count
FROM
    our_occurrences AS sbom_occurrences
ORDER BY
    sbom_occurrences.highest_severity DESC NULLS LAST
LIMIT 20
OFFSET 0;
Limit  (cost=262921.34..262921.39 rows=20 width=46) (actual time=72119.044..72119.071 rows=20 loops=1)
   Buffers: shared hit=21122 read=104992 dirtied=18425
   I/O Timings: read=69457.503 write=0.000
   CTE our_occurrences
     ->  Unique  (cost=262333.97..262390.87 rows=11381 width=364) (actual time=72064.841..72095.592 rows=31417 loops=1)
           Buffers: shared hit=21119 read=104992 dirtied=18425
           I/O Timings: read=69457.503 write=0.000
           ->  Sort  (cost=262333.97..262362.42 rows=11381 width=364) (actual time=72064.834..72078.045 rows=116902 loops=1)
                 Sort Key: sbom_occurrences_1.component_version_id DESC, (dense_rank() OVER (?)) DESC, sbom_occurrences_1.id DESC
                 Sort Method: quicksort  Memory: 69071kB
                 Buffers: shared hit=21119 read=104992 dirtied=18425
                 I/O Timings: read=69457.503 write=0.000
                 ->  WindowAgg  (cost=261168.88..261567.21 rows=11381 width=364) (actual time=71598.042..71822.867 rows=116902 loops=1)
                       Buffers: shared hit=21116 read=104992 dirtied=18425
                       I/O Timings: read=69457.503 write=0.000
                       ->  WindowAgg  (cost=261168.88..261396.50 rows=11381 width=356) (actual time=71598.009..71720.746 rows=116902 loops=1)
                             Buffers: shared hit=21116 read=104992 dirtied=18425
                             I/O Timings: read=69457.503 write=0.000
                             ->  Sort  (cost=261168.88..261197.33 rows=11381 width=348) (actual time=71597.966..71622.938 rows=116902 loops=1)
                                   Sort Key: sbom_occurrences_1.component_version_id, sbom_occurrences_1.project_id
                                   Sort Method: quicksort  Memory: 66023kB
                                   Buffers: shared hit=21116 read=104992 dirtied=18425
                                   I/O Timings: read=69457.503 write=0.000
                                   ->  Nested Loop  (cost=19684.41..260402.12 rows=11381 width=348) (actual time=6224.376..71262.661 rows=116902 loops=1)
                                         Buffers: shared hit=21116 read=104992 dirtied=18425
                                         I/O Timings: read=69457.503 write=0.000
                                         ->  HashAggregate  (cost=19683.84..19802.35 rows=11851 width=4) (actual time=6196.426..6203.410 rows=4096 loops=1)
                                               Group Key: projects.id
                                               Buffers: shared hit=1958 read=5741 dirtied=144
                                               I/O Timings: read=5363.588 write=0.000
                                               ->  Nested Loop  (cost=870.04..19654.21 rows=11851 width=4) (actual time=846.269..6183.456 rows=4096 loops=1)
                                                     Buffers: shared hit=1958 read=5741 dirtied=144
                                                     I/O Timings: read=5363.588 write=0.000
                                                     ->  HashAggregate  (cost=869.60..874.24 rows=464 width=28) (actual time=840.022..841.736 rows=883 loops=1)
                                                           Group Key: namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)]
                                                           Buffers: shared hit=108 read=800
                                                           I/O Timings: read=86.235 write=0.000
                                                           ->  Bitmap Heap Scan on public.namespaces  (cost=142.97..868.44 rows=464 width=28) (actual time=74.444..837.801 rows=883 loops=1)
                                                                 Buffers: shared hit=108 read=800
                                                                 I/O Timings: read=86.235 write=0.000
                                                                 ->  Bitmap Index Scan using index_namespaces_on_traversal_ids_for_groups  (cost=0.00..142.85 rows=464 width=0) (actual time=71.332..71.334 rows=885 loops=1)
                                                                       Index Cond: (namespaces.traversal_ids @> '{9970}'::integer[])
                                                                       Buffers: shared hit=1 read=93
                                                                       I/O Timings: read=68.868 write=0.000
                                                     ->  Index Scan using index_projects_on_namespace_id_and_repository_size_limit on public.projects  (cost=0.44..40.21 rows=26 width=8) (actual time=1.679..6.042 rows=5 loops=883)
                                                           Index Cond: (projects.namespace_id = (namespaces.traversal_ids)[array_length(namespaces.traversal_ids, 1)])
                                                           Filter: ((projects.marked_for_deletion_at IS NULL) AND (NOT projects.pending_delete))
                                                           Rows Removed by Filter: 0
                                                           Buffers: shared hit=1850 read=4941 dirtied=144
                                                           I/O Timings: read=5277.352 write=0.000
                                         ->  Index Scan using idx_sbom_occurrences_on_project_id_and_source_id on public.sbom_occurrences sbom_occurrences_1  (cost=0.56..13.15 rows=715 width=348) (actual time=0.695..15.862 rows=29 loops=4096)
                                               Index Cond: (sbom_occurrences_1.project_id = projects.id)
                                               Filter: (sbom_occurrences_1.component_version_id IS NOT NULL)
                                               Rows Removed by Filter: 0
                                               Buffers: shared hit=19158 read=99251 dirtied=18281
                                               I/O Timings: read=64093.915 write=0.000
   ->  Sort  (cost=530.46..558.92 rows=11381 width=46) (actual time=72119.036..72119.041 rows=20 loops=1)
         Sort Key: sbom_occurrences.highest_severity DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 27kB
         Buffers: shared hit=21122 read=104992 dirtied=18425
         I/O Timings: read=69457.503 write=0.000
         ->  CTE Scan on our_occurrences sbom_occurrences  (cost=0.00..227.62 rows=11381 width=46) (actual time=72064.853..72113.567 rows=31417 loops=1)
               Buffers: shared hit=21119 read=104992 dirtied=18425
               I/O Timings: read=69457.503 write=0.000
Time: 1.203 min
  - planning: 10.969 ms
  - execution: 1.202 min
    - I/O read: 1.158 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 21122 (~165.00 MiB) from the buffer pool
  - reads: 104992 (~820.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 18425 (~143.90 MiB)
  - writes: 0
New Query

https://console.postgres.ai/shared/9a56a012-4b92-4e75-8f24-acd922954ec0

WITH namespaces AS MATERIALIZED (
    SELECT
        namespaces.traversal_ids
    FROM
        namespaces
    WHERE
        namespaces.type = 'Group' AND
        traversal_ids >= '{9970}' AND
        traversal_ids < '{9971}'
)
SELECT
    min( outer_occurrences.id )::bigint AS id,
    outer_occurrences.component_id,
    outer_occurrences.component_version_id,
    min( outer_occurrences.package_manager ) AS package_manager,
    min( outer_occurrences.input_file_path ) AS input_file_path,
    jsonb_agg( outer_occurrences.licenses -> 0 ) AS licenses,
    sum( counts.occurrence_count )::integer AS occurrence_count,
    sum( counts.vulnerability_count )::integer AS vulnerability_count,
    sum( counts.project_count )::integer AS project_count
FROM
    (
        SELECT
            DISTINCT ON ( inner_occurrences.component_id, inner_occurrences.component_version_id )
            inner_occurrences.id,
            inner_occurrences.created_at,
            inner_occurrences.updated_at,
            inner_occurrences.component_version_id,
            inner_occurrences.project_id,
            inner_occurrences.pipeline_id,
            inner_occurrences.source_id,
            inner_occurrences.commit_sha,
            inner_occurrences.component_id,
            inner_occurrences.uuid,
            inner_occurrences.package_manager,
            inner_occurrences.component_name,
            inner_occurrences.input_file_path,
            inner_occurrences.licenses,
            inner_occurrences.highest_severity,
            inner_occurrences.vulnerability_count,
            inner_occurrences.source_package_id,
            inner_occurrences.archived,
            inner_occurrences.traversal_ids,
            inner_occurrences.ancestors
        FROM
            namespaces,
            LATERAL (
                SELECT
                    DISTINCT ON ( sbom_occurrences.component_id, sbom_occurrences.component_version_id )
                    sbom_occurrences.id,
                    sbom_occurrences.created_at,
                    sbom_occurrences.updated_at,
                    sbom_occurrences.component_version_id,
                    sbom_occurrences.project_id,
                    sbom_occurrences.pipeline_id,
                    sbom_occurrences.source_id,
                    sbom_occurrences.commit_sha,
                    sbom_occurrences.component_id,
                    sbom_occurrences.uuid,
                    sbom_occurrences.package_manager,
                    sbom_occurrences.component_name,
                    sbom_occurrences.input_file_path,
                    sbom_occurrences.licenses,
                    sbom_occurrences.highest_severity,
                    sbom_occurrences.vulnerability_count,
                    sbom_occurrences.source_package_id,
                    sbom_occurrences.archived,
                    sbom_occurrences.traversal_ids,
                    sbom_occurrences.ancestors
                FROM
                    sbom_occurrences
                WHERE
                    sbom_occurrences.traversal_ids = namespaces.traversal_ids::bigint[] AND
                    sbom_occurrences.archived = false
                ORDER BY
                    sbom_occurrences.component_id ASC,
                    sbom_occurrences.component_version_id ASC
                LIMIT 20
            ) AS inner_occurrences
        ORDER BY
            inner_occurrences.component_id ASC,
            inner_occurrences.component_version_id ASC
        LIMIT 20
    ) AS outer_occurrences,
    LATERAL (
        SELECT
            count( project_id ) AS occurrence_count,
            count( project_id ) AS project_count,
            sum( vulnerability_count ) AS vulnerability_count
        FROM
            sbom_occurrences
        WHERE
            traversal_ids >= '{9970}' AND
            traversal_ids < '{9971}' AND
            sbom_occurrences.archived = false AND
            sbom_occurrences.component_version_id = outer_occurrences.component_version_id
    ) AS counts
GROUP BY
    outer_occurrences.component_id,
    outer_occurrences.component_version_id
ORDER BY
    min( outer_occurrences.component_id ) ASC,
    min( outer_occurrences.component_version_id ) ASC
LIMIT 20
OFFSET 0;
Limit  (cost=110.09..110.14 rows=20 width=148) (actual time=2704.113..2704.122 rows=20 loops=1)
   Buffers: shared hit=4322 read=2666 dirtied=138
   I/O Timings: read=2649.254 write=0.000
   CTE namespaces
     ->  Index Only Scan using index_namespaces_on_traversal_ids_for_groups_btree on public.namespaces namespaces_1  (cost=0.56..3.58 rows=1 width=28) (actual time=5.439..265.826 rows=884 loops=1)
           Index Cond: ((namespaces_1.traversal_ids >= '{9970}'::integer[]) AND (namespaces_1.traversal_ids < '{9971}'::integer[]))
           Heap Fetches: 197
           Buffers: shared hit=567 read=219 dirtied=30
           I/O Timings: read=258.907 write=0.000
   ->  Sort  (cost=106.52..106.57 rows=20 width=148) (actual time=2704.111..2704.117 rows=20 loops=1)
         Sort Key: (min(sbom_occurrences.component_id)), (min(sbom_occurrences.component_version_id))
         Sort Method: quicksort  Memory: 29kB
         Buffers: shared hit=4322 read=2666 dirtied=138
         I/O Timings: read=2649.254 write=0.000
         ->  Aggregate  (cost=35.77..106.08 rows=20 width=148) (actual time=2647.534..2704.019 rows=20 loops=1)
               Group Key: sbom_occurrences.component_id, sbom_occurrences.component_version_id
               Buffers: shared hit=4319 read=2666 dirtied=138
               I/O Timings: read=2649.254 write=0.000
               ->  Nested Loop  (cost=35.77..104.93 rows=20 width=178) (actual time=2646.866..2703.580 rows=20 loops=1)
                     Buffers: shared hit=4319 read=2666 dirtied=138
                     I/O Timings: read=2649.254 write=0.000
                     ->  Limit  (cost=32.18..32.33 rows=20 width=353) (actual time=2642.949..2643.099 rows=20 loops=1)
                           Buffers: shared hit=4177 read=2619 dirtied=138
                           I/O Timings: read=2590.152 write=0.000
                           ->  Unique  (cost=32.18..32.33 rows=20 width=353) (actual time=2642.948..2643.081 rows=20 loops=1)
                                 Buffers: shared hit=4177 read=2619 dirtied=138
                                 I/O Timings: read=2590.152 write=0.000
                                 ->  Sort  (cost=32.18..32.23 rows=20 width=353) (actual time=2642.947..2643.027 rows=66 loops=1)
                                       Sort Key: sbom_occurrences.component_id, sbom_occurrences.component_version_id
                                       Sort Method: quicksort  Memory: 489kB
                                       Buffers: shared hit=4177 read=2619 dirtied=138
                                       I/O Timings: read=2590.152 write=0.000
                                       ->  Nested Loop  (cost=0.59..31.75 rows=20 width=353) (actual time=11.595..2637.737 rows=1828 loops=1)
                                             Buffers: shared hit=4177 read=2619 dirtied=138
                                             I/O Timings: read=2590.152 write=0.000
                                             ->  CTE Scan on namespaces  (cost=0.00..0.02 rows=1 width=32) (actual time=5.445..266.961 rows=884 loops=1)
                                                   Buffers: shared hit=567 read=219 dirtied=30
                                                   I/O Timings: read=258.907 write=0.000
                                             ->  Limit  (cost=0.59..31.33 rows=20 width=353) (actual time=0.273..2.678 rows=2 loops=884)
                                                   Buffers: shared hit=3610 read=2400 dirtied=108
                                                   I/O Timings: read=2331.245 write=0.000
                                                   ->  Unique  (cost=0.59..3343.70 rows=2175 width=353) (actual time=0.272..2.677 rows=2 loops=884)
                                                         Buffers: shared hit=3610 read=2400 dirtied=108
                                                         I/O Timings: read=2331.245 write=0.000
                                                         ->  Index Scan using index_unarchived_sbom_occurrences_for_aggregations on public.sbom_occurrences  (cost=0.59..3332.82 rows=2176 width=353) (actual time=0.270..2.670 rows=3 loops=884)
                                                               Index Cond: (sbom_occurrences.traversal_ids = (namespaces.traversal_ids)::bigint[])
                                                               Buffers: shared hit=3610 read=2400 dirtied=108
                                                               I/O Timings: read=2331.245 write=0.000
                     ->  Aggregate  (cost=3.59..3.60 rows=1 width=24) (actual time=3.019..3.019 rows=1 loops=20)
                           Buffers: shared hit=142 read=47
                           I/O Timings: read=59.102 write=0.000
                           ->  Index Scan using index_unarchived_occurrences_on_version_id_and_traversal_ids on public.sbom_occurrences sbom_occurrences_1  (cost=0.56..3.58 rows=1 width=12) (actual time=2.947..3.011 rows=5 loops=20)
                                 Index Cond: ((sbom_occurrences_1.component_version_id = sbom_occurrences.component_version_id) AND (sbom_occurrences_1.traversal_ids >= '{9970}'::bigint[]) AND (sbom_occurrences_1.traversal_ids < '{9971}'::bigint[]))
                                 Buffers: shared hit=142 read=47
                                 I/O Timings: read=59.102 write=0.000
Time: 2.710 s
  - planning: 5.123 ms
  - execution: 2.704 s
    - I/O read: 2.649 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4322 (~33.80 MiB) from the buffer pool
  - reads: 2666 (~20.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 138 (~1.10 MiB)
  - writes: 0

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Enable the feature flag: echo 'Feature.enable(:rewrite_sbom_occurrences_query)' | bundle exec rails c
  2. Login to GitLab
  3. Go to a group which has dependencies
  4. Go to Security -> Dependency List
Edited by Brian Williams

Merge request reports

Loading