SBOM occurrences retain stale pipeline_id causing incorrect vulnerability context in GlobalAdvisoryScanWorker
## Summary SBOM occurrences are not updating their `pipeline_id` when re-ingested, causing them to retain the pipeline reference from the first time they were detected. This stale pipeline reference is then used by `PackageMetadata::GlobalAdvisoryScanWorker` when creating vulnerabilities, leading to incorrect tracked context associations and production errors. ## Problem Details ### Current Behavior In `ee/app/services/sbom/ingestion/tasks/ingest_occurrences.rb`, when upserting SBOM occurrences: 1. The `attributes_changed?` method explicitly excludes `PIPELINE_ATTRIBUTES_KEYS` (`pipeline_id` and `commit_sha`) from comparison (lines 99-105) 2. This means if an occurrence already exists, it will **not** be updated even if it appears in a new pipeline 3. The occurrence retains the `pipeline_id` from the first pipeline where it was detected ```ruby PIPELINE_ATTRIBUTES_KEYS = %i[pipeline_id commit_sha].freeze def attributes_changed?(new_attributes) uuid = new_attributes[:uuid] existing_occurrence = existing_occurrences_by_uuid[uuid] return true unless existing_occurrence compared_attributes = new_attributes.keys - PIPELINE_ATTRIBUTES_KEYS # Excludes pipeline_id! stable_new_attributes = new_attributes.deep_symbolize_keys.slice(*compared_attributes) stable_existing_attributes = existing_occurrence.attributes.deep_symbolize_keys.slice(*compared_attributes) stable_new_attributes != stable_existing_attributes end ``` ### Impact on GlobalAdvisoryScanWorker When `PackageMetadata::GlobalAdvisoryScanWorker` processes advisories: 1. It retrieves SBOM occurrences via `Sbom::PossiblyAffectedOccurrencesFinder` 2. For each occurrence, it creates a `PossiblyAffectedComponent` using `sbom_occurrence.pipeline` (line 20 in `possibly_affected_component.rb`) 3. This pipeline is then used to determine the tracked context via `tracked_context(affected_component.pipeline)` (line 102 in `advisory_scanner.rb`) **The problem:** The pipeline associated with the occurrence may be: - From a different branch than the current default branch - From a deleted/old pipeline - `nil` if the original pipeline was deleted - Not representative of where the component currently exists This causes two production errors: 1. **NoMethodError** (#582958): When the original pipeline has been deleted, `pipeline` is `nil`, causing `pipeline.id` to fail 2. **ArgumentError** (#582960): When the original pipeline was on a non-default branch that doesn't have a tracked context ## Root Cause Analysis The design decision to exclude `pipeline_id` from updates appears intentional (to avoid constant updates), but it creates a fundamental mismatch: - **SBOM ingestion** treats occurrences as stable entities identified by `uuid` (based on component + version + source + project) - **Advisory scanning** assumes the `pipeline` reference represents current/accurate context for vulnerability tracking ## Proposed Solutions ### Option 1: Update pipeline_id on re-ingestion (Recommended) Always update `pipeline_id` and `commit_sha` when re-ingesting occurrences, removing them from `PIPELINE_ATTRIBUTES_KEYS` exclusion. **Pros:** - Ensures pipeline reference is always current - Fixes both production errors - Aligns with "Vulnerabilities Across Multiple Branches" initiative **Cons:** - May increase database writes - Need to assess performance impact ### Option 2: Use a different pipeline reference for advisory scanning Modify `GlobalAdvisoryScanWorker` to determine the appropriate pipeline/context differently, rather than relying on `sbom_occurrence.pipeline`. **Pros:** - Doesn't change SBOM ingestion behavior - Could be more accurate for vulnerability context **Cons:** - More complex implementation - Need to determine what the "correct" pipeline should be ### Option 3: Track multiple pipeline references Add a mechanism to track all pipelines where an occurrence has been detected, not just the first one. **Pros:** - Most complete solution - Enables tracking vulnerabilities across all branches **Cons:** - Significant schema and logic changes - Higher complexity ## Questions to Resolve 1. **Why are pipeline attributes excluded from updates?** Is there a performance or data consistency reason? 2. **What is the intended behavior?** Should occurrences track the latest pipeline or the first pipeline? 3. **How does this align with "Vulnerabilities Across Multiple Branches"?** Should we be tracking occurrences per branch/context? 4. **Performance impact:** What is the cost of updating pipeline_id on every ingestion? ## Related Issues - #582958 - NoMethodError when pipeline is nil - #582960 - ArgumentError for non-default branch contexts - Parent Epic: #18375 (Vulnerabilities Across Multiple Branches Iteration 1 - System Changes) ## Additional Context This issue is blocking the rollout of the `set_tracked_context_during_ingestion` feature flag and affecting production stability of advisory scanning.
issue