Add malware field to Dependency GraphQL API and REST API

Summary

Part of gitlab-org/gitlab#587647 — Phase 1 backend requirement for the Malicious Package UI Representation and Filters epic.

Adds a nullable malware: Boolean field across all four dependency surfaces:

Surface Endpoint / Type Level Used by UI
GraphQL DependencyType via DependencyInterface Project
GraphQL DependencyAggregationType via DependencyInterface Group
Grape REST API GET /api/v4/projects/:id/dependencies Project
Controller JSON GET /projects/:id/-/dependencies.json Project
Controller JSON GET /groups/:id/-/dependencies.json Group

Note: This MR depends on the MalwareDetection concern and Vulnerability.malware_status_for introduced in !228811 (merged) (merged).

Approach

Detection logic

The malware check has two distinct paths depending on the level:

Project-level — delegates to the MalwareDetection concern via preloaded vulnerabilities:

def malware_status
  # @has_glam_vulnerability not set at project level, falls through
  ::Vulnerability.malware_status_for(vulnerabilities, malware_vulnerable)
end

def malware_vulnerable
  @malware_vulnerable || project
end

Group-level — uses a batch SQL query (component_version_ids_with_glam_vulnerabilities) that checks for GLAM identifiers across ALL occurrences in the namespace, not just the MIN(id) representative. The result is set as has_glam_vulnerability on each occurrence:

def malware_status
  if instance_variable_defined?(:@has_glam_vulnerability)
    return true if @has_glam_vulnerability

    return ::Vulnerability.sscs_addon_active_for?(malware_vulnerable) ? false : nil
  end

  ::Vulnerability.malware_status_for(vulnerabilities, malware_vulnerable)
end

Field values

Value Meaning
true Malware package detected (GLAM identifier present)
false Not a malware package (SSCS add-on active, no GLAM identifier)
null SSCS add-on not active — determination not possible

Preloading and malware resolution strategy

The strategy is split by level to handle the fact that AggregationsFinder#execute omits project_id and source_id from its SELECT and uses MIN(id) as the representative occurrence.

Project-level (DependenciesResolver) — standard Rails preloading:

malware: [{ project: [:group] }, { vulnerabilities: [:vulnerability_read] }]

Preloads vulnerabilities + reads for in-memory GLAM check. { project: [:group] } avoids lazy load for the feature flag check.

Group-level (DependencyAggregationResolver) — batch vulnerability preload across the group:

def preloads
  # malware: excluded - vulnerabilities batch-loaded across group via component_version_id (not MIN(id))
  super.except(:packager, :location, :malware)
end

When malware is selected, the resolver:

  1. Force-loads the relation and sets the group reference on each occurrence
  2. Runs preload_vulnerabilities_across_group which:
    • Query 1: Sbom::Occurrence.vulnerability_ids_by_component_version — gets all (component_version_id, vulnerability_ids) pairs across the namespace
    • Query 2+3: Batch loads vulnerabilities with vulnerability_read via Vulnerability.id_in(...).with_vulnerability_read
    • Sets occ.association(:vulnerabilities).target with the full vulnerability list
  3. malware_status uses the standard in-memory malware_status_for(vulnerabilities, malware_vulnerable) path — now with the complete vulnerability set from across the group

REST API / Controller:

Endpoint Malware resolution
API::Dependencies (Grape, project) .with_vulnerabilities_vulnerability_reads_and_project → in-memory check
Projects::DependenciesController .with_vulnerabilities_vulnerability_reads_and_project → in-memory check
Groups::DependenciesController prepare_malware_vulnerable! → sets group + runs preload_vulnerabilities_across_group!

Authorization fix

DependencyEntity#can_read_vulnerabilities? was checking request.try(:project) which is nil for group-level requests, causing the malware field to never be exposed at group level. Fixed to use subject (which falls back to request.group), consistent with how can_read_security_resource? and can_read_licenses? already work.

Database

Queries and performance analysis

New queries introduced

There are two query paths depending on the level:

  • Project-level: Rails preloading scopes (preload(vulnerabilities: [:vulnerability_read])) — batch WHERE id IN (...) queries. The in-memory malware_status_for iterates preloaded vulnerabilities.
  • Group-level: A batch SQL query (component_version_ids_with_glam_vulnerabilities) that checks for GLAM identifiers across ALL occurrences in the namespace — not limited to the MIN(id) representative occurrence.

1. Preload vulnerabilities via join table (project-level)

Triggered by preload(vulnerabilities: [:vulnerability_read]).

Join table query
EXPLAIN (ANALYZE, BUFFERS)
SELECT "sbom_occurrences_vulnerabilities".*
FROM "sbom_occurrences_vulnerabilities"
WHERE "sbom_occurrences_vulnerabilities"."sbom_occurrence_id" IN (
  SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
);
Vulnerabilities batch load
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerabilities".*
FROM "vulnerabilities"
WHERE "vulnerabilities"."id" IN (
  SELECT vulnerability_id
  FROM sbom_occurrences_vulnerabilities
  WHERE sbom_occurrence_id IN (
    SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
  )
);
Vulnerability reads batch load
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerability_reads".*
FROM "vulnerability_reads"
WHERE "vulnerability_reads"."vulnerability_id" IN (
  SELECT v.id
  FROM vulnerabilities v
  INNER JOIN sbom_occurrences_vulnerabilities sov ON v.id = sov.vulnerability_id
  WHERE sov.sbom_occurrence_id IN (
    SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
  )
);

2. Batch load vulnerability IDs across group (group-level only)

Triggered by Sbom::Occurrence.vulnerability_ids_by_component_version(group, cv_ids). This query loads ALL distinct vulnerability IDs across all occurrences in the namespace for the page's component versions — not limited to the MIN(id) representative.

Query 2a: Get vulnerability IDs by component_version_id
EXPLAIN (ANALYZE, BUFFERS)
SELECT "sbom_occurrences"."component_version_id",
       array_agg(DISTINCT "sbom_occurrences_vulnerabilities"."vulnerability_id")
FROM "sbom_occurrences"
INNER JOIN "sbom_occurrences_vulnerabilities" ON "sbom_occurrences_vulnerabilities"."sbom_occurrence_id" = "sbom_occurrences"."id"
WHERE "sbom_occurrences"."traversal_ids" >= ARRAY[<group_id>]
  AND ARRAY[<group_id> + 1] > "sbom_occurrences"."traversal_ids"
  AND "sbom_occurrences"."component_version_id" IN (<cv_id_1>, <cv_id_2>, ..., <cv_id_20>)
GROUP BY "sbom_occurrences"."component_version_id";
Query 2b: Batch load vulnerabilities with vulnerability_reads
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerabilities".* FROM "vulnerabilities"
WHERE "vulnerabilities"."id" IN (<all_vuln_ids from 2a>);

EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerability_reads".* FROM "vulnerability_reads"
WHERE "vulnerability_reads"."vulnerability_id" IN (<all_vuln_ids from 2a>);

Indexes used:

  • sbom_occurrences: index_sbom_occurrences_on_component_version_id, index_sbom_occurrences_on_traversal_ids_and_package_manager
  • sbom_occurrences_vulnerabilities: i_sbom_occurrences_vulnerabilities_on_occ_id_and_vuln_id (unique)
  • vulnerabilities: Primary key
  • vulnerability_reads: index_vulnerability_reads_on_vulnerability_id (unique)

3. Preload project with group (project-level only)

Triggered by preload({ project: [:group] }).

Queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT "projects".* FROM "projects" WHERE "projects"."id" = <project_id>;

EXPLAIN (ANALYZE, BUFFERS)
SELECT "namespaces".* FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
  AND "namespaces"."id" = (SELECT namespace_id FROM projects WHERE id = <project_id>);

4. SSCS add-on check — no DB query

sscs_addon_active_for?(vulnerable) calls vulnerable.sscs_malware_detection_feature_flag_enabled? — a WIP feature flag check. No database query. Result cached per request via Gitlab::SafeRequestStore.

Query summary

# Query Scope Index Bounded by
1a sbom_occurrences_vulnerabilities by occurrence IDs Project Unique index Page size (20)
1b vulnerabilities by ID Project Primary key Vulns for 20 occurrences
1c vulnerability_reads by vulnerability ID Project Unique index Vulns for 20 occurrences
2a Vulnerability IDs by component_version (2-table join) Group traversal_ids + unique indexes 20 component_version_ids
2b Batch load vulnerabilities + vulnerability_reads Group Primary key + unique index Distinct vulns for 20 cv_ids
3 projects + namespaces by ID Project Primary key 1 each
4 Feature flag check Both N/A (no query)

Total new queries per request: Project: 5 (preloads + project/group). Group: 3 (vuln IDs + vulns + reads). No N+1.

Production performance data

Based on production clone analysis (227.9M vulnerability_reads, 5.37M sbom_occurrences with vulnerabilities):

Metric p50 p90 p95 p99 Max
Vulns per dependency (project, in-memory scan) 2 9 21 190 12,614
Identifiers scanned per dependency (project, in-memory) 6 30 67 990 72,590
Identifiers per vulnerability_read 1 5 6 10 25
Distinct vulns per component (group, join rows) 3 21 42 195 399,399
GLAM prevalence 38 / 227.9M (0.00%)

Project-level (in-memory path): At p99, 990 String#start_with? calls per dependency — microseconds. Worst case 72,590 — under 1ms.

Group-level (batch preload + in-memory scan): Three queries per page:

  1. Join query to get (component_version_id, vulnerability_ids) — p99: ~1s, worst case (p100, top 20 heaviest components in namespace 9970): ~3.2s
  2. Batch load vulnerabilities — WHERE id IN (...) on primary key
  3. Batch load vulnerability_reads — WHERE vulnerability_id IN (...) on unique index

For the worst-case page (top 20 heaviest components), 53K distinct vulnerabilities are loaded. The in-memory GLAM scan is under 1ms. The join query dominates the cost.

For typical pages the total is sub-200ms. The p99 page (~195 vulns per component, ~3.8K total) completes in ~1-2s including all three queries.

GLAM identifiers are extremely rare (38 / 227.9M = 0.00%), so the feature has minimal real-world impact on existing query patterns until GLAM data is actively populated.

Files changed

Files changed
File Change
MalwareDetection concern
ee/app/models/concerns/vulnerabilities/malware_detection.rb Accept Project or Group (vulnerable param), class-aware cache key, add MALWARE_PACKAGE_IDENTIFIER_PREFIX constant
Occurrence model
ee/app/models/sbom/occurrence.rb Add malware_vulnerable reader with project fallback, vulnerability_ids_by_component_version class method, with_vulnerabilities_and_reads scope, update with_vulnerabilities_vulnerability_reads_and_project to preload { project: [:group] }
GraphQL
ee/app/graphql/types/sbom/dependency_interface.rb Add malware field (experiment, milestone 19.0)
ee/app/graphql/resolvers/sbom/dependency_interface_resolver.rb Base malware preload without :project
ee/app/graphql/resolvers/sbom/dependencies_resolver.rb Project-level: adds { project: [:group] } to malware preloads
ee/app/graphql/resolvers/sbom/dependency_aggregation_resolver.rb Group-level: super.except(:packager, :location, :malware), sets malware_vulnerable = group, batch-loads vulnerabilities across group via preload_vulnerabilities_across_group
ee/app/models/ee/vulnerability.rb Add with_vulnerability_read scope
REST API (Grape)
ee/lib/api/dependencies.rb Apply with_vulnerabilities_vulnerability_reads_and_project scope
ee/lib/api/entities/dependency.rb Add malware field gated on can_read_vulnerabilities?
Controllers
ee/app/controllers/projects/dependencies_controller.rb Chain with_vulnerabilities_vulnerability_reads_and_project
ee/app/controllers/groups/dependencies_controller.rb Chain with_vulnerabilities_and_reads, add prepare_malware_vulnerable! with preload_vulnerabilities_across_group!
Serializer
ee/app/serializers/dependency_entity.rb Add malware field, fix can_read_vulnerabilities? to use subject

Follow-up

  • #598208DependencyVulnerabilitiesResolver returns incomplete vulnerabilities for group-level aggregated dependencies (pre-existing MIN(id) limitation)
Edited by Bala Kumar

Merge request reports

Loading