Add filtering by package manager to `Sbom::AggregationsFinder`
What does this MR do and why?
Context
After a gap analysis of the features available on the group and project level dependency pages1, we have begun work to bring the pages to parity.
For the group level, sometimes the Sbom::AggregationsFinder
is
used to fetch the occurrences.
This change
- Adds the
package_managers
key as a valid filter param to the finder - Adds a spec coverage for the new filter
Follow-up work
This change only adds the filtering to this finder. Passing the params from the controller and actually using this filter will be done in subsequent MRs
SQL
before
comparing the new filter to the existing component_names
filter
raw SQL
SELECT
"outer_occurrences"."highest_severity",
"outer_occurrences"."component_id",
"outer_occurrences"."component_version_id",
MIN(outer_occurrences.id)::bigint AS id,
MIN(outer_occurrences.package_manager) AS package_manager,
MIN(outer_occurrences.input_file_path) AS input_file_path,
MIN(outer_occurrences.licenses -> 0 ->> 'spdx_identifier') AS primary_license_spdx_identifier,
COALESCE(JSONB_AGG(outer_occurrences.licenses -> 0) FILTER (WHERE outer_occurrences.licenses -> 0 IS NOT NULL), '[]') AS licenses,
SUM(counts.occurrence_count)::integer AS occurrence_count,
SUM(counts.vulnerability_count)::integer AS vulnerability_count,
SUM(counts.project_count)::integer AS project_count
FROM ( SELECT DISTINCT ON ("inner_occurrences"."highest_severity", "inner_occurrences"."component_id", "inner_occurrences"."component_version_id")
"inner_occurrences"."id",
"inner_occurrences"."created_at",
"inner_occurrences"."updated_at",
"inner_occurrences"."component_version_id",
"inner_occurrences"."project_id",
"inner_occurrences"."pipeline_id",
"inner_occurrences"."source_id",
"inner_occurrences"."commit_sha",
"inner_occurrences"."component_id",
"inner_occurrences"."uuid",
"inner_occurrences"."package_manager",
"inner_occurrences"."component_name",
"inner_occurrences"."input_file_path",
"inner_occurrences"."licenses",
"inner_occurrences"."highest_severity",
"inner_occurrences"."vulnerability_count",
"inner_occurrences"."source_package_id",
"inner_occurrences"."archived",
"inner_occurrences"."traversal_ids",
"inner_occurrences"."ancestors",
"inner_occurrences"."reachability"
FROM ( WITH RECURSIVE "loose_index_scan_cte" AS (
(
SELECT
"sbom_occurrences"."traversal_ids"
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
ORDER BY
"sbom_occurrences"."traversal_ids" ASC
LIMIT 1)
UNION (
SELECT
(
SELECT
"sbom_occurrences"."traversal_ids"
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
AND "sbom_occurrences"."traversal_ids" > "loose_index_scan_cte"."traversal_ids"
ORDER BY
"sbom_occurrences"."traversal_ids" ASC
LIMIT 1) AS traversal_ids
FROM
"loose_index_scan_cte"))
SELECT
"traversal_ids"
FROM
"loose_index_scan_cte" AS "sbom_occurrences"
WHERE
"sbom_occurrences"."traversal_ids" IS NOT NULL) AS namespaces,
LATERAL ( SELECT DISTINCT ON ("sbom_occurrences"."highest_severity", "sbom_occurrences"."component_id", "sbom_occurrences"."component_version_id")
"sbom_occurrences"."id",
"sbom_occurrences"."created_at",
"sbom_occurrences"."updated_at",
"sbom_occurrences"."component_version_id",
"sbom_occurrences"."project_id",
"sbom_occurrences"."pipeline_id",
"sbom_occurrences"."source_id",
"sbom_occurrences"."commit_sha",
"sbom_occurrences"."component_id",
"sbom_occurrences"."uuid",
"sbom_occurrences"."package_manager",
"sbom_occurrences"."component_name",
"sbom_occurrences"."input_file_path",
"sbom_occurrences"."licenses",
"sbom_occurrences"."highest_severity",
"sbom_occurrences"."vulnerability_count",
"sbom_occurrences"."source_package_id",
"sbom_occurrences"."archived",
"sbom_occurrences"."traversal_ids",
"sbom_occurrences"."ancestors",
"sbom_occurrences"."reachability"
FROM
"sbom_occurrences"
WHERE (sbom_occurrences.traversal_ids = namespaces.traversal_ids)
AND "sbom_occurrences"."archived" = FALSE
AND "sbom_occurrences"."component_name" = 'rails'
ORDER BY
"sbom_occurrences"."highest_severity" DESC NULLS LAST,
"sbom_occurrences"."component_id" DESC,
"sbom_occurrences"."component_version_id" DESC NULLS FIRST
LIMIT 21) inner_occurrences
ORDER BY
"inner_occurrences"."highest_severity" DESC NULLS LAST,
"inner_occurrences"."component_id" DESC,
"inner_occurrences"."component_version_id" DESC NULLS FIRST
LIMIT 21) outer_occurrences,
LATERAL (
SELECT
COUNT(project_id) AS occurrence_count,
COUNT(DISTINCT project_id) project_count,
SUM(vulnerability_count) vulnerability_count
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
AND (sbom_occurrences.component_version_id = outer_occurrences.component_version_id)) counts
GROUP BY
"outer_occurrences"."highest_severity",
"outer_occurrences"."component_id",
"outer_occurrences"."component_version_id"
ORDER BY
MIN("outer_occurrences"."highest_severity") DESC NULLS LAST,
MIN("outer_occurrences"."component_id") DESC,
MIN("outer_occurrences"."component_version_id") DESC NULLS FIRST
LIMIT 21;
-
explain
query: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/35754/commands/110542
after
raw SQL
SELECT
"outer_occurrences"."highest_severity",
"outer_occurrences"."component_id",
"outer_occurrences"."component_version_id",
MIN(outer_occurrences.id)::bigint AS id,
MIN(outer_occurrences.package_manager) AS package_manager,
MIN(outer_occurrences.input_file_path) AS input_file_path,
MIN(outer_occurrences.licenses -> 0 ->> 'spdx_identifier') AS primary_license_spdx_identifier,
COALESCE(JSONB_AGG(outer_occurrences.licenses -> 0) FILTER (WHERE outer_occurrences.licenses -> 0 IS NOT NULL), '[]') AS licenses,
SUM(counts.occurrence_count)::integer AS occurrence_count,
SUM(counts.vulnerability_count)::integer AS vulnerability_count,
SUM(counts.project_count)::integer AS project_count
FROM ( SELECT DISTINCT ON ("inner_occurrences"."highest_severity", "inner_occurrences"."component_id", "inner_occurrences"."component_version_id")
"inner_occurrences"."id",
"inner_occurrences"."created_at",
"inner_occurrences"."updated_at",
"inner_occurrences"."component_version_id",
"inner_occurrences"."project_id",
"inner_occurrences"."pipeline_id",
"inner_occurrences"."source_id",
"inner_occurrences"."commit_sha",
"inner_occurrences"."component_id",
"inner_occurrences"."uuid",
"inner_occurrences"."package_manager",
"inner_occurrences"."component_name",
"inner_occurrences"."input_file_path",
"inner_occurrences"."licenses",
"inner_occurrences"."highest_severity",
"inner_occurrences"."vulnerability_count",
"inner_occurrences"."source_package_id",
"inner_occurrences"."archived",
"inner_occurrences"."traversal_ids",
"inner_occurrences"."ancestors",
"inner_occurrences"."reachability"
FROM ( WITH RECURSIVE "loose_index_scan_cte" AS (
(
SELECT
"sbom_occurrences"."traversal_ids"
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
ORDER BY
"sbom_occurrences"."traversal_ids" ASC
LIMIT 1)
UNION (
SELECT
(
SELECT
"sbom_occurrences"."traversal_ids"
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
AND "sbom_occurrences"."traversal_ids" > "loose_index_scan_cte"."traversal_ids"
ORDER BY
"sbom_occurrences"."traversal_ids" ASC
LIMIT 1) AS traversal_ids
FROM
"loose_index_scan_cte"))
SELECT
"traversal_ids"
FROM
"loose_index_scan_cte" AS "sbom_occurrences"
WHERE
"sbom_occurrences"."traversal_ids" IS NOT NULL) AS namespaces,
LATERAL ( SELECT DISTINCT ON ("sbom_occurrences"."highest_severity", "sbom_occurrences"."component_id", "sbom_occurrences"."component_version_id")
"sbom_occurrences"."id",
"sbom_occurrences"."created_at",
"sbom_occurrences"."updated_at",
"sbom_occurrences"."component_version_id",
"sbom_occurrences"."project_id",
"sbom_occurrences"."pipeline_id",
"sbom_occurrences"."source_id",
"sbom_occurrences"."commit_sha",
"sbom_occurrences"."component_id",
"sbom_occurrences"."uuid",
"sbom_occurrences"."package_manager",
"sbom_occurrences"."component_name",
"sbom_occurrences"."input_file_path",
"sbom_occurrences"."licenses",
"sbom_occurrences"."highest_severity",
"sbom_occurrences"."vulnerability_count",
"sbom_occurrences"."source_package_id",
"sbom_occurrences"."archived",
"sbom_occurrences"."traversal_ids",
"sbom_occurrences"."ancestors",
"sbom_occurrences"."reachability"
FROM
"sbom_occurrences"
WHERE (sbom_occurrences.traversal_ids = namespaces.traversal_ids)
AND "sbom_occurrences"."archived" = FALSE
AND "sbom_occurrences"."package_manager" = 'maven'
ORDER BY
"sbom_occurrences"."highest_severity" DESC NULLS LAST,
"sbom_occurrences"."component_id" DESC,
"sbom_occurrences"."component_version_id" DESC NULLS FIRST
LIMIT 21) inner_occurrences
ORDER BY
"inner_occurrences"."highest_severity" DESC NULLS LAST,
"inner_occurrences"."component_id" DESC,
"inner_occurrences"."component_version_id" DESC NULLS FIRST
LIMIT 21) outer_occurrences,
LATERAL (
SELECT
COUNT(project_id) AS occurrence_count,
COUNT(DISTINCT project_id) project_count,
SUM(vulnerability_count) vulnerability_count
FROM
"sbom_occurrences"
WHERE (traversal_ids >= ('{9970}'))
AND (traversal_ids < ('{9971}'))
AND "sbom_occurrences"."archived" = FALSE
AND (sbom_occurrences.component_version_id = outer_occurrences.component_version_id)) counts
GROUP BY
"outer_occurrences"."highest_severity",
"outer_occurrences"."component_id",
"outer_occurrences"."component_version_id"
ORDER BY
MIN("outer_occurrences"."highest_severity") DESC NULLS LAST,
MIN("outer_occurrences"."component_id") DESC,
MIN("outer_occurrences"."component_version_id") DESC NULLS FIRST
LIMIT 21
-
explain
query: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/35754/commands/110545
Diff of Query's SQL
--- /tmp/before 2025-01-31 17:42:17.390592980 +0700
+++ /tmp/after 2025-01-31 17:42:17.391592998 +0700
@@ -92,6 +92,7 @@
"sbom_occurrences"
WHERE (sbom_occurrences.traversal_ids = namespaces.traversal_ids)
AND "sbom_occurrences"."archived" = FALSE
+ AND "sbom_occurrences"."package_manager" = 'maven'
ORDER BY
"sbom_occurrences"."highest_severity" DESC NULLS LAST,
"sbom_occurrences"."component_id" DESC,
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Before | After |
---|---|
before You can see the results are not affected by the &package_managers[]=maven filter |
after You can see the results are filtered by &package_managers[]=maven
|
How to set up and validate locally
Prerequisites
- An EE-licensed gdk setup
- You will need a ci runner set up for your gdk
- you can follow the "Setup Guide for Vulnerability Explanation and Resolution: Setup Runner" guide
- You will need a group that has dependency reports with dependencies that span several package managers
- you can follow the "Setup Guide for Vulnerability Explanation and Resolution: Setup Vulnerability Report" guide
- If you import all of the sample projects provided in the "Setup Guide for Vulnerability Explanation and Resolution: Setup Vulnerability Report" guide under a single group, running a pipeline on the default branch of all of those projects, the group will have the required test data
Steps
- find a group that has sbom dependencies that span several package managers
- visit the dependencies API directly for this group
- In the screen recordings, I use
http://gdk.test:3000/groups/gitlab-org/-/dependencies.json?sort_by=severity&sort=desc&filter=all&page=1
- use the filter by manipulating the URL. In the screen recordings I filter for
maven
by appending&package_managers[]=maven
to the URL - You should see basic filtering behavior
related to: #493774
Edited by Michael Becker