List group vulnerability scanners using vulnerability statistics
What does this MR do and why?
Resolve a cross-database join in EE::Group#vulnerability_scanners
by JOINing vulnerability_statistics to list vulnerability_scanners
in a group and its subgroups.
Before this change, the method joined vulnerability_scanners with projects and namespaces,
but these tables belong to different gitlab schemas.
The implementation leverages the traversal_ids and archived columns added to vulnerability_scanners
in !177591 (merged).
It uses the by_group and unarchived scopes added to Vulnerabilities::Statistic
as part of !177892 (merged).
The change is behind a feature flag named group_vulnerability_scanners_using_statistics.
Feature flag rollout issue is #513300 (closed).
As a result of this change, #vulnerability_scanners might return scanners of projects
that have been marked for deletion (i.e. such as projects.delete_pending is true).
In similar queries, we decided that listing such projects was acceptable.
See #437635 (comment 1749715754)
Specs update
- Add
contextfor when the feature flag is disabled. - Create vulnerability statistics related to the scanners retrieved by the query. This is necessary for the join to work.
- Replace
contain_exactlymatcher withincludeand.not_to include. This is necessary to ignore the extra scanners that are being created when calling thevulnerability_statisticfactory while keeping the code simple. - Add an unrelated scanner that belongs to a project that's not in the group being selected.
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
bundle exec bin/rspec -E '#vulnerability_scanners' ee/spec/models/ee/group_spec.rb
bundle exec bin/rspec -E 'when listing scanners for group' ee/spec/graphql/resolvers/vulnerabilities/scanners_resolver_spec.rb
SQL queries
The following queries have been collected using the ActiveRecord::Base.logger when running the specs.
The traversal IDs have been change to match gitlab-org.
The query plans of the original queries have been collected before the ones of the new queries.
As a result, the new queries might benefit from warm cache,
but that would be limited to vulnerability_scanners and vulnerability_occurrences.
vulnerability_statistics.traversal_ids and .archived were backfilled on console.postgres.ai
before running the new queries. See !177892 (comment 2297685662)
The new queries use idx_vulnerability_statistics_on_traversal_ids_and_letter_grade index
introduced in !177892 (diffs).
Query triggered by EE::Group#vulnerability_scanners:
SELECT
"vulnerability_scanners".*
FROM
"vulnerability_scanners"
WHERE
"vulnerability_scanners"."project_id" IN (
SELECT
"vulnerability_statistics"."project_id"
FROM
"vulnerability_statistics"
WHERE
"vulnerability_statistics"."traversal_ids" >= '{9970}'
AND "vulnerability_statistics"."traversal_ids" < '{9971}'
AND "vulnerability_statistics"."archived" = FALSE
)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/35212/commands/109340
Query triggered by ScannersResolver:
SELECT
DISTINCT ON (
"vulnerability_scanners"."external_id",
"report_types"."report_type"
) "vulnerability_scanners".*,
"report_types"."report_type" AS "report_type"
FROM
"vulnerability_scanners"
JOIN LATERAL (
SELECT
"vulnerability_occurrences"."report_type"
FROM
"vulnerability_occurrences"
WHERE
"vulnerability_occurrences"."scanner_id" = "vulnerability_scanners"."id"
LIMIT
1
) report_types ON true
WHERE
"vulnerability_scanners"."project_id" IN (
SELECT
"vulnerability_statistics"."project_id"
FROM
"vulnerability_statistics"
WHERE
"vulnerability_statistics"."traversal_ids" >= '{9970}'
AND "vulnerability_statistics"."traversal_ids" < '{9971}'
AND "vulnerability_statistics"."archived" = FALSE
)
ORDER BY
"vulnerability_scanners"."external_id" ASC,
"report_types"."report_type" ASC
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/35212/commands/109341
SQL queries prior to this change
Query triggered by EE::Group#vulnerability_scanners:
SELECT
"vulnerability_scanners".*
FROM
"vulnerability_scanners"
WHERE
"vulnerability_scanners"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (
traversal_ids @> ('{9970}')
)
)
AND "projects"."archived" = FALSE
AND "projects"."pending_delete" = FALSE
)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/35212/commands/109338
Query triggered by ScannersResolver:
SELECT
DISTINCT ON (
"vulnerability_scanners"."external_id",
"report_types"."report_type"
) "vulnerability_scanners".*,
"report_types"."report_type" AS "report_type"
FROM
"vulnerability_scanners"
JOIN LATERAL (
SELECT
"vulnerability_occurrences"."report_type"
FROM
"vulnerability_occurrences"
WHERE
"vulnerability_occurrences"."scanner_id" = "vulnerability_scanners"."id"
LIMIT
1
) report_types ON true
WHERE
"vulnerability_scanners"."project_id" IN (
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."namespace_id" IN (
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group'
AND (
traversal_ids @ > ('{9970}')
)
)
AND "projects"."archived" = FALSE
AND "projects"."pending_delete" = FALSE
)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/35212/commands/109339
Related to #513032 (closed)