Skip to content

Use the package_manager column in filtering and sorting

Mehmet Emin INAC requested to merge minac_415665_do_not_join_sbom_sources into master

What does this MR do and why?

This will improve the performance of the group-level dependencies page by removing a join to the sbom_sources table.

Database review

This MR introduces 2 new indices for the sbom_occurrences table;

  1. A compound index on the project_id and package_manager columns to improve the performance of filtering by user-provided package manager values
  2. A compound index on the project_id, component_id, and id columns to improve the performance of the group by subquery. This will improve the performance as the database will not need to read actual records from the disk and do an index-only scan to aggregate the results.
Migration outputs
Version 20230721134004
rake db:migrate:up
main: == [advisory_lock_connection] object_id: 219640, pg_backend_pid: 93876
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: migrating
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1465s
main: -- index_exists?(:sbom_occurrences, [:project_id, :package_manager], {:name=>"index_sbom_occurrences_on_project_id_and_package_manager", :algorithm=>:concurrently})
main:    -> 0.0040s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- add_index(:sbom_occurrences, [:project_id, :package_manager], {:name=>"index_sbom_occurrences_on_project_id_and_package_manager", :algorithm=>:concurrently})
main:    -> 0.0025s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: migrated (0.1678s)

main: == [advisory_lock_connection] object_id: 219640, pg_backend_pid: 93876
rake db:migrate:down
main: == [advisory_lock_connection] object_id: 219660, pg_backend_pid: 95399
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: reverting
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1401s
main: -- indexes(:sbom_occurrences)
main:    -> 0.0045s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_project_id_and_package_manager"})
main:    -> 0.0024s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: reverted (0.1611s)

main: == [advisory_lock_connection] object_id: 219660, pg_backend_pid: 95399
Version 20230721134546
rake db:migrate:up
main: == [advisory_lock_connection] object_id: 234840, pg_backend_pid: 94339
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: migrating =
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1395s
main: -- index_exists?(:sbom_occurrences, [:project_id, :component_id, :id], {:name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id", :algorithm=>:concurrently})
main:    -> 0.0050s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- add_index(:sbom_occurrences, [:project_id, :component_id, :id], {:name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id", :algorithm=>:concurrently})
main:    -> 0.0029s
main: -- execute("RESET statement_timeout")
main:    -> 0.0004s
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: migrated (0.1663s)

main: == [advisory_lock_connection] object_id: 234840, pg_backend_pid: 94339
rake db:migrate:down
main: == [advisory_lock_connection] object_id: 219720, pg_backend_pid: 94902
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: reverting =
main: -- transaction_open?()
main:    -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main:    -> 0.1451s
main: -- indexes(:sbom_occurrences)
main:    -> 0.0050s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id"})
main:    -> 0.0014s
main: -- execute("RESET statement_timeout")
main:    -> 0.0004s
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: reverted (0.1684s)

main: == [advisory_lock_connection] object_id: 219720, pg_backend_pid: 94902
Queries
The original query
SELECT
    sbom_occurrences.*,
    agg_occurrences.occurrence_count,
    agg_occurrences.project_count
FROM
    sbom_occurrences
    INNER JOIN (
        SELECT
            component_id,
            COUNT(DISTINCT id) AS occurrence_count,
            COUNT(DISTINCT project_id) AS project_count
        FROM
            sbom_occurrences
        WHERE
            project_id IN (
                SELECT
                    "projects"."id"
                FROM
                    "projects"
                WHERE
                    "projects"."namespace_id" IN (
                        SELECT
                            namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
                        FROM
                            "namespaces"
                        WHERE
                            "namespaces"."type" = 'Group'
                            AND (traversal_ids @> ('{5022335}'))))
                GROUP BY
                    component_id) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
    INNER JOIN "sbom_sources" ON "sbom_sources"."id" = "sbom_occurrences"."source_id"
WHERE
    "sbom_occurrences"."project_id" IN (
        SELECT
            "projects"."id"
        FROM
            "projects"
        WHERE
            "projects"."namespace_id" IN (
                SELECT
                    namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
                FROM
                    "namespaces"
                WHERE
                    "namespaces"."type" = 'Group'
                    AND (traversal_ids @> ('{5022335}'))))
        AND "sbom_occurrences"."source_id" IN (
            SELECT
                "sbom_sources"."id"
            FROM
                "sbom_sources"
            WHERE (source -> 'package_manager' ->> 'name' IN ('go', 'pipenv', 'yarn')))
ORDER BY
    sbom_sources.source -> 'package_manager' -> 'name' ASC
LIMIT 25 OFFSET 0;

And its execution plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19690/commands/64495

New query
SELECT
    sbom_occurrences.*,
    agg_occurrences.occurrence_count,
    agg_occurrences.project_count
FROM
    sbom_occurrences
    INNER JOIN (
        SELECT
            component_id,
            COUNT(DISTINCT id) AS occurrence_count,
            COUNT(DISTINCT project_id) AS project_count
        FROM
            sbom_occurrences
        WHERE
            project_id IN (
                SELECT
                    "projects"."id"
                FROM
                    "projects"
                WHERE
                    "projects"."namespace_id" IN (
                        SELECT
                            namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
                        FROM
                            "namespaces"
                        WHERE
                            "namespaces"."type" = 'Group'
                            AND (traversal_ids @> ('{5022335}'))))
                GROUP BY
                    component_id) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE
    "sbom_occurrences"."project_id" IN (
        SELECT
            "projects"."id"
        FROM
            "projects"
        WHERE
            "projects"."namespace_id" IN (
                SELECT
                    namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
                FROM
                    "namespaces"
                WHERE
                    "namespaces"."type" = 'Group'
                    AND (traversal_ids @> ('{5022335}'))))
        AND "sbom_occurrences"."package_manager" IN ('yarn', 'go', 'pipenv')
ORDER BY
    "sbom_occurrences"."package_manager" DESC
LIMIT 25 OFFSET 0

Its execution plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20639/commands/67729

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Mehmet Emin INAC

Merge request reports