Improve performance of `Sbom::DependenciesFinder` when sorting by component name
Why are we doing this work
As reported here, the query performance requires improvement before this feature rollout.
Current query:
SELECT
sbom_occurrences.*, agg_occurrences.occurrence_count, agg_occurrences.project_count
FROM sbom_occurrences
INNER JOIN (
SELECT component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM sbom_occurrences
WHERE project_id IN (
SELECT "projects"."id" FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{68921140}'))
)
)
GROUP BY component_id
) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
INNER JOIN "sbom_components" ON "sbom_components"."id" = "sbom_occurrences"."component_id"
WHERE "sbom_occurrences"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{68921140}'))
)
)
ORDER BY sbom_components.name asc
LIMIT 20
OFFSET 0;
Link to the query plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20569/commands/67534
Link to the test group: https://gitlab.com/groups/gitlab-org/govern/threat-insights-demos/verification-projects/verify-408846-group/
Relevant links
Non-functional requirements
-
Documentation: -
Feature flag: -
Performance: -
Testing:
Implementation plan
-
database Add component_name
column astext
as part ofsbom_occurrences
-
database Create a new index for the component_name
column onsbom_occurrences
table -
backend Update Sbom::Ingestion::Tasks::IngestOccurrences
to savecomponent_name
-
database Create a new migration to backfill component_name
based onsbom_components.name
-
backend Update query to have the inner join removed while replacing the sorting from sbom_components.name
tosbom_occurrences.component_name
Query plan for the above implementation plan:
SELECT
sbom_occurrences.*, agg_occurrences.occurrence_count, agg_occurrences.project_count
FROM sbom_occurrences
INNER JOIN (
SELECT component_id,
COUNT(DISTINCT id) AS occurrence_count,
COUNT(DISTINCT project_id) AS project_count
FROM sbom_occurrences
WHERE project_id IN (
SELECT "projects"."id" FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{68921140}'))
)
)
GROUP BY component_id
) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE "sbom_occurrences"."project_id" IN (
SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN (
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{68921140}'))
)
)
ORDER BY sbom_occurrences.component_name asc
LIMIT 20
OFFSET 0;
Link to the query plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20569/commands/67623
Verification steps
Edited by Mehmet Emin INAC