Add malware field to Dependency GraphQL API and REST API
Summary
Part of gitlab-org/gitlab#587647 — Phase 1 backend requirement for the Malicious Package UI Representation and Filters epic.
Adds a nullable malware: Boolean field across all four dependency surfaces:
| Surface | Endpoint / Type | Level | Used by UI |
|---|---|---|---|
| GraphQL | DependencyType via DependencyInterface |
Project | |
| GraphQL | DependencyAggregationType via DependencyInterface |
Group | |
| Grape REST API | GET /api/v4/projects/:id/dependencies |
Project | |
| Controller JSON | GET /projects/:id/-/dependencies.json |
Project | |
| Controller JSON | GET /groups/:id/-/dependencies.json |
Group |
Note: This MR depends on the
MalwareDetectionconcern andVulnerability.malware_status_forintroduced in !228811 (merged) (merged).
Approach
Detection logic
The malware check has two distinct paths depending on the level:
Project-level — delegates to the MalwareDetection concern via preloaded vulnerabilities:
def malware_status
# @has_glam_vulnerability not set at project level, falls through
::Vulnerability.malware_status_for(vulnerabilities, malware_vulnerable)
end
def malware_vulnerable
@malware_vulnerable || project
endGroup-level — uses a batch SQL query (component_version_ids_with_glam_vulnerabilities) that checks for GLAM identifiers across ALL occurrences in the namespace, not just the MIN(id) representative. The result is set as has_glam_vulnerability on each occurrence:
def malware_status
if instance_variable_defined?(:@has_glam_vulnerability)
return true if @has_glam_vulnerability
return ::Vulnerability.sscs_addon_active_for?(malware_vulnerable) ? false : nil
end
::Vulnerability.malware_status_for(vulnerabilities, malware_vulnerable)
endField values
| Value | Meaning |
|---|---|
true |
Malware package detected (GLAM identifier present) |
false |
Not a malware package (SSCS add-on active, no GLAM identifier) |
null |
SSCS add-on not active — determination not possible |
Preloading and malware resolution strategy
The strategy is split by level to handle the fact that AggregationsFinder#execute omits project_id and source_id from its SELECT and uses MIN(id) as the representative occurrence.
Project-level (DependenciesResolver) — standard Rails preloading:
malware: [{ project: [:group] }, { vulnerabilities: [:vulnerability_read] }]Preloads vulnerabilities + reads for in-memory GLAM check. { project: [:group] } avoids lazy load for the feature flag check.
Group-level (DependencyAggregationResolver) — batch vulnerability preload across the group:
def preloads
# malware: excluded - vulnerabilities batch-loaded across group via component_version_id (not MIN(id))
super.except(:packager, :location, :malware)
endWhen malware is selected, the resolver:
- Force-loads the relation and sets the group reference on each occurrence
- Runs
preload_vulnerabilities_across_groupwhich:- Query 1:
Sbom::Occurrence.vulnerability_ids_by_component_version— gets all(component_version_id, vulnerability_ids)pairs across the namespace - Query 2+3: Batch loads vulnerabilities with
vulnerability_readviaVulnerability.id_in(...).with_vulnerability_read - Sets
occ.association(:vulnerabilities).targetwith the full vulnerability list
- Query 1:
malware_statususes the standard in-memorymalware_status_for(vulnerabilities, malware_vulnerable)path — now with the complete vulnerability set from across the group
REST API / Controller:
| Endpoint | Malware resolution |
|---|---|
API::Dependencies (Grape, project) |
.with_vulnerabilities_vulnerability_reads_and_project → in-memory check |
Projects::DependenciesController |
.with_vulnerabilities_vulnerability_reads_and_project → in-memory check |
Groups::DependenciesController |
prepare_malware_vulnerable! → sets group + runs preload_vulnerabilities_across_group! |
Authorization fix
DependencyEntity#can_read_vulnerabilities? was checking request.try(:project) which is nil for group-level requests, causing the malware field to never be exposed at group level. Fixed to use subject (which falls back to request.group), consistent with how can_read_security_resource? and can_read_licenses? already work.
Database
Queries and performance analysis
New queries introduced
There are two query paths depending on the level:
- Project-level: Rails preloading scopes (
preload(vulnerabilities: [:vulnerability_read])) — batchWHERE id IN (...)queries. The in-memorymalware_status_foriterates preloaded vulnerabilities. - Group-level: A batch SQL query (
component_version_ids_with_glam_vulnerabilities) that checks for GLAM identifiers across ALL occurrences in the namespace — not limited to theMIN(id)representative occurrence.
1. Preload vulnerabilities via join table (project-level)
Triggered by preload(vulnerabilities: [:vulnerability_read]).
Join table query
EXPLAIN (ANALYZE, BUFFERS)
SELECT "sbom_occurrences_vulnerabilities".*
FROM "sbom_occurrences_vulnerabilities"
WHERE "sbom_occurrences_vulnerabilities"."sbom_occurrence_id" IN (
SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
);Vulnerabilities batch load
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerabilities".*
FROM "vulnerabilities"
WHERE "vulnerabilities"."id" IN (
SELECT vulnerability_id
FROM sbom_occurrences_vulnerabilities
WHERE sbom_occurrence_id IN (
SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
)
);Vulnerability reads batch load
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerability_reads".*
FROM "vulnerability_reads"
WHERE "vulnerability_reads"."vulnerability_id" IN (
SELECT v.id
FROM vulnerabilities v
INNER JOIN sbom_occurrences_vulnerabilities sov ON v.id = sov.vulnerability_id
WHERE sov.sbom_occurrence_id IN (
SELECT id FROM sbom_occurrences WHERE project_id = <project_id> ORDER BY id LIMIT 20
)
);2. Batch load vulnerability IDs across group (group-level only)
Triggered by Sbom::Occurrence.vulnerability_ids_by_component_version(group, cv_ids). This query loads ALL distinct vulnerability IDs across all occurrences in the namespace for the page's component versions — not limited to the MIN(id) representative.
Query 2a: Get vulnerability IDs by component_version_id
EXPLAIN (ANALYZE, BUFFERS)
SELECT "sbom_occurrences"."component_version_id",
array_agg(DISTINCT "sbom_occurrences_vulnerabilities"."vulnerability_id")
FROM "sbom_occurrences"
INNER JOIN "sbom_occurrences_vulnerabilities" ON "sbom_occurrences_vulnerabilities"."sbom_occurrence_id" = "sbom_occurrences"."id"
WHERE "sbom_occurrences"."traversal_ids" >= ARRAY[<group_id>]
AND ARRAY[<group_id> + 1] > "sbom_occurrences"."traversal_ids"
AND "sbom_occurrences"."component_version_id" IN (<cv_id_1>, <cv_id_2>, ..., <cv_id_20>)
GROUP BY "sbom_occurrences"."component_version_id";Query 2b: Batch load vulnerabilities with vulnerability_reads
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerabilities".* FROM "vulnerabilities"
WHERE "vulnerabilities"."id" IN (<all_vuln_ids from 2a>);
EXPLAIN (ANALYZE, BUFFERS)
SELECT "vulnerability_reads".* FROM "vulnerability_reads"
WHERE "vulnerability_reads"."vulnerability_id" IN (<all_vuln_ids from 2a>);Indexes used:
sbom_occurrences:index_sbom_occurrences_on_component_version_id,index_sbom_occurrences_on_traversal_ids_and_package_managersbom_occurrences_vulnerabilities:i_sbom_occurrences_vulnerabilities_on_occ_id_and_vuln_id(unique)vulnerabilities: Primary keyvulnerability_reads:index_vulnerability_reads_on_vulnerability_id(unique)
3. Preload project with group (project-level only)
Triggered by preload({ project: [:group] }).
Queries
EXPLAIN (ANALYZE, BUFFERS)
SELECT "projects".* FROM "projects" WHERE "projects"."id" = <project_id>;
EXPLAIN (ANALYZE, BUFFERS)
SELECT "namespaces".* FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND "namespaces"."id" = (SELECT namespace_id FROM projects WHERE id = <project_id>);4. SSCS add-on check — no DB query
sscs_addon_active_for?(vulnerable) calls vulnerable.sscs_malware_detection_feature_flag_enabled? — a WIP feature flag check. No database query. Result cached per request via Gitlab::SafeRequestStore.
Query summary
| # | Query | Scope | Index | Bounded by |
|---|---|---|---|---|
| 1a | sbom_occurrences_vulnerabilities by occurrence IDs |
Project | Unique index | Page size (20) |
| 1b | vulnerabilities by ID |
Project | Primary key | Vulns for 20 occurrences |
| 1c | vulnerability_reads by vulnerability ID |
Project | Unique index | Vulns for 20 occurrences |
| 2a | Vulnerability IDs by component_version (2-table join) | Group | traversal_ids + unique indexes | 20 component_version_ids |
| 2b | Batch load vulnerabilities + vulnerability_reads | Group | Primary key + unique index | Distinct vulns for 20 cv_ids |
| 3 | projects + namespaces by ID |
Project | Primary key | 1 each |
| 4 | Feature flag check | Both | N/A (no query) | — |
Total new queries per request: Project: 5 (preloads + project/group). Group: 3 (vuln IDs + vulns + reads). No N+1.
Production performance data
Based on production clone analysis (227.9M vulnerability_reads, 5.37M sbom_occurrences with vulnerabilities):
| Metric | p50 | p90 | p95 | p99 | Max |
|---|---|---|---|---|---|
| Vulns per dependency (project, in-memory scan) | 2 | 9 | 21 | 190 | 12,614 |
| Identifiers scanned per dependency (project, in-memory) | 6 | 30 | 67 | 990 | 72,590 |
| Identifiers per vulnerability_read | 1 | 5 | 6 | 10 | 25 |
| Distinct vulns per component (group, join rows) | 3 | 21 | 42 | 195 | 399,399 |
| GLAM prevalence | — | — | — | — | 38 / 227.9M (0.00%) |
Project-level (in-memory path): At p99, 990 String#start_with? calls per dependency — microseconds. Worst case 72,590 — under 1ms.
Group-level (batch preload + in-memory scan): Three queries per page:
- Join query to get
(component_version_id, vulnerability_ids)— p99: ~1s, worst case (p100, top 20 heaviest components in namespace 9970): ~3.2s - Batch load vulnerabilities —
WHERE id IN (...)on primary key - Batch load vulnerability_reads —
WHERE vulnerability_id IN (...)on unique index
For the worst-case page (top 20 heaviest components), 53K distinct vulnerabilities are loaded. The in-memory GLAM scan is under 1ms. The join query dominates the cost.
For typical pages the total is sub-200ms. The p99 page (~195 vulns per component, ~3.8K total) completes in ~1-2s including all three queries.
GLAM identifiers are extremely rare (38 / 227.9M = 0.00%), so the feature has minimal real-world impact on existing query patterns until GLAM data is actively populated.
Files changed
Files changed
| File | Change |
|---|---|
| MalwareDetection concern | |
ee/app/models/concerns/vulnerabilities/malware_detection.rb |
Accept Project or Group (vulnerable param), class-aware cache key, add MALWARE_PACKAGE_IDENTIFIER_PREFIX constant |
| Occurrence model | |
ee/app/models/sbom/occurrence.rb |
Add malware_vulnerable reader with project fallback, vulnerability_ids_by_component_version class method, with_vulnerabilities_and_reads scope, update with_vulnerabilities_vulnerability_reads_and_project to preload { project: [:group] } |
| GraphQL | |
ee/app/graphql/types/sbom/dependency_interface.rb |
Add malware field (experiment, milestone 19.0) |
ee/app/graphql/resolvers/sbom/dependency_interface_resolver.rb |
Base malware preload without :project |
ee/app/graphql/resolvers/sbom/dependencies_resolver.rb |
Project-level: adds { project: [:group] } to malware preloads |
ee/app/graphql/resolvers/sbom/dependency_aggregation_resolver.rb |
Group-level: super.except(:packager, :location, :malware), sets malware_vulnerable = group, batch-loads vulnerabilities across group via preload_vulnerabilities_across_group |
ee/app/models/ee/vulnerability.rb |
Add with_vulnerability_read scope |
| REST API (Grape) | |
ee/lib/api/dependencies.rb |
Apply with_vulnerabilities_vulnerability_reads_and_project scope |
ee/lib/api/entities/dependency.rb |
Add malware field gated on can_read_vulnerabilities? |
| Controllers | |
ee/app/controllers/projects/dependencies_controller.rb |
Chain with_vulnerabilities_vulnerability_reads_and_project |
ee/app/controllers/groups/dependencies_controller.rb |
Chain with_vulnerabilities_and_reads, add prepare_malware_vulnerable! with preload_vulnerabilities_across_group! |
| Serializer | |
ee/app/serializers/dependency_entity.rb |
Add malware field, fix can_read_vulnerabilities? to use subject |
Follow-up
- #598208 —
DependencyVulnerabilitiesResolverreturns incomplete vulnerabilities for group-level aggregated dependencies (pre-existingMIN(id)limitation)
Related
- Companion MR (Vulnerabilities GraphQL API): !228811 (merged) (merged)
- Malware filter MR (ES-based): !228713 (merged)
- Issue: #587647
- Parent epic: gitlab-org#18456