Skip to content

Return a unique SBoM occurrence for each unique component/version

mo khan requested to merge mokhax/422088/group-by-component-version into master

What does this MR do and why?

This change uses a window function to compute the occurrence and project counts and returns a unique SBoM occurrence for each unique SBoM Component/Version to reduce the amount of data that is needed to filter and sort through.

Related to:

Before:

WITH "our_occurrences" AS MATERIALIZED (
  SELECT "sbom_occurrences".*
  FROM "sbom_occurrences"
  WHERE "sbom_occurrences"."project_id" IN (
    SELECT "projects"."id"
    FROM "projects"
    WHERE "projects"."namespace_id" IN (
      SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
      FROM "namespaces"
      WHERE "namespaces"."type" = 'Group'
      AND (traversal_ids @> ('{6543}'))
    )
    AND "projects"."marked_for_deletion_at" IS NULL
    AND "projects"."pending_delete" = FALSE
  )
)
SELECT sbom_occurrences.*, agg_occurrences.occurrence_count, agg_occurrences.project_count
FROM "sbom_occurrences"
INNER JOIN (
  SELECT component_id,
        COUNT(DISTINCT id) AS occurrence_count,
        COUNT(DISTINCT project_id) AS project_count
  FROM our_occurrences
  GROUP BY component_id
) agg_occurrences ON sbom_occurrences.component_id = agg_occurrences.component_id
WHERE "sbom_occurrences"."id" IN (
  SELECT "our_occurrences"."id"
  FROM "our_occurrences"
)
ORDER BY "sbom_occurrences"."id" ASC
LIMIT 20 OFFSET 0;
Time: 496.995 ms
  - planning: 11.399 ms
  - execution: 485.596 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 248033 (~1.90 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22603/commands/72705

After:

WITH "our_occurrences" AS MATERIALIZED (
  SELECT DISTINCT ON (sbom_occurrences.component_version_id) sbom_occurrences.*,
    COUNT(sbom_occurrences.id) OVER (PARTITION BY sbom_occurrences.component_version_id) AS occurrence_count,
    COUNT(sbom_occurrences.project_id) OVER (PARTITION BY sbom_occurrences.component_version_id) AS project_count
  FROM "sbom_occurrences" 
  WHERE "sbom_occurrences"."project_id" IN (
    SELECT "projects"."id"
    FROM "projects"
    WHERE "projects"."namespace_id" IN (
      SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
      FROM "namespaces"
      WHERE "namespaces"."type" = 'Group'
      AND (traversal_ids @> ('{6543}'))
    )
    AND "projects"."marked_for_deletion_at" IS NULL
    AND "projects"."pending_delete" = FALSE
  )
  AND "sbom_occurrences"."component_version_id" IS NOT NULL
  ORDER BY 
    "sbom_occurrences"."component_version_id" DESC,
    "sbom_occurrences"."id" DESC
)
SELECT "sbom_occurrences".*
FROM "our_occurrences" AS "sbom_occurrences" 
ORDER BY sbom_occurrences.id asc 
LIMIT 20 OFFSET 0;
Time: 282.846 ms
  - planning: 10.849 ms
  - execution: 271.997 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 54593 (~426.50 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/22603/commands/72708

Screenshots or screen recordings

Before After
image image

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by mo khan

Merge request reports