Use the package_manager column in filtering and sorting
What does this MR do and why?
This will improve the performance of the group-level dependencies page
by removing a join to the sbom_sources
table.
Database review
This MR introduces 2 new indices for the sbom_occurrences
table;
- A compound index on the
project_id
andpackage_manager
columns to improve the performance of filtering by user-provided package manager values - A compound index on the
project_id
,component_id
, andid
columns to improve the performance of thegroup by
subquery. This will improve the performance as the database will not need to read actual records from the disk and do an index-only scan to aggregate the results.
Migration outputs
Version 20230721134004
rake db:migrate:up
main: == [advisory_lock_connection] object_id: 219640, pg_backend_pid: 93876
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: migrating
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1465s
main: -- index_exists?(:sbom_occurrences, [:project_id, :package_manager], {:name=>"index_sbom_occurrences_on_project_id_and_package_manager", :algorithm=>:concurrently})
main: -> 0.0040s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0003s
main: -- add_index(:sbom_occurrences, [:project_id, :package_manager], {:name=>"index_sbom_occurrences_on_project_id_and_package_manager", :algorithm=>:concurrently})
main: -> 0.0025s
main: -- execute("RESET statement_timeout")
main: -> 0.0005s
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: migrated (0.1678s)
main: == [advisory_lock_connection] object_id: 219640, pg_backend_pid: 93876
rake db:migrate:down
main: == [advisory_lock_connection] object_id: 219660, pg_backend_pid: 95399
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: reverting
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1401s
main: -- indexes(:sbom_occurrences)
main: -> 0.0045s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0003s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_project_id_and_package_manager"})
main: -> 0.0024s
main: -- execute("RESET statement_timeout")
main: -> 0.0005s
main: == 20230721134004 IndexProjectIdAndPackageManagerForSbomOccurrences: reverted (0.1611s)
main: == [advisory_lock_connection] object_id: 219660, pg_backend_pid: 95399
Version 20230721134546
rake db:migrate:up
main: == [advisory_lock_connection] object_id: 234840, pg_backend_pid: 94339
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: migrating =
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1395s
main: -- index_exists?(:sbom_occurrences, [:project_id, :component_id, :id], {:name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id", :algorithm=>:concurrently})
main: -> 0.0050s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0004s
main: -- add_index(:sbom_occurrences, [:project_id, :component_id, :id], {:name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id", :algorithm=>:concurrently})
main: -> 0.0029s
main: -- execute("RESET statement_timeout")
main: -> 0.0004s
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: migrated (0.1663s)
main: == [advisory_lock_connection] object_id: 234840, pg_backend_pid: 94339
rake db:migrate:down
main: == [advisory_lock_connection] object_id: 219720, pg_backend_pid: 94902
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: reverting =
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.1451s
main: -- indexes(:sbom_occurrences)
main: -> 0.0050s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0004s
main: -- remove_index(:sbom_occurrences, {:algorithm=>:concurrently, :name=>"index_sbom_occurrences_on_project_id_and_component_id_and_id"})
main: -> 0.0014s
main: -- execute("RESET statement_timeout")
main: -> 0.0004s
main: == 20230721134546 IndexProjectIdComponentIdAndIdForSbomOccurrences: reverted (0.1684s)
main: == [advisory_lock_connection] object_id: 219720, pg_backend_pid: 94902
Queries
The original query
SELECT
sbom_occurrences.*,
agg_occurrences.occurrence_count,
agg_occurrences.project_count
FROM
sbom_occurrences
INNER JOIN (
SELECT
component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM
sbom_occurrences
WHERE
project_id IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{5022335}'))))
GROUP BY
component_id) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
INNER JOIN "sbom_sources" ON "sbom_sources"."id" = "sbom_occurrences"."source_id"
WHERE
"sbom_occurrences"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{5022335}'))))
AND "sbom_occurrences"."source_id" IN (
SELECT
"sbom_sources"."id"
FROM
"sbom_sources"
WHERE (source -> 'package_manager' ->> 'name' IN ('go', 'pipenv', 'yarn')))
ORDER BY
sbom_sources.source -> 'package_manager' -> 'name' ASC
LIMIT 25 OFFSET 0;
And its execution plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/19690/commands/64495
New query
SELECT
sbom_occurrences.*,
agg_occurrences.occurrence_count,
agg_occurrences.project_count
FROM
sbom_occurrences
INNER JOIN (
SELECT
component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM
sbom_occurrences
WHERE
project_id IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{5022335}'))))
GROUP BY
component_id) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE
"sbom_occurrences"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (traversal_ids @> ('{5022335}'))))
AND "sbom_occurrences"."package_manager" IN ('yarn', 'go', 'pipenv')
ORDER BY
"sbom_occurrences"."package_manager" DESC
LIMIT 25 OFFSET 0
Its execution plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20639/commands/67729
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Mehmet Emin INAC