SBOM occurrences retain stale pipeline_id causing incorrect vulnerability context in GlobalAdvisoryScanWorker
## Summary
SBOM occurrences are not updating their `pipeline_id` when re-ingested, causing them to retain the pipeline reference from the first time they were detected. This stale pipeline reference is then used by `PackageMetadata::GlobalAdvisoryScanWorker` when creating vulnerabilities, leading to incorrect tracked context associations and production errors.
## Problem Details
### Current Behavior
In `ee/app/services/sbom/ingestion/tasks/ingest_occurrences.rb`, when upserting SBOM occurrences:
1. The `attributes_changed?` method explicitly excludes `PIPELINE_ATTRIBUTES_KEYS` (`pipeline_id` and `commit_sha`) from comparison (lines 99-105)
2. This means if an occurrence already exists, it will **not** be updated even if it appears in a new pipeline
3. The occurrence retains the `pipeline_id` from the first pipeline where it was detected
```ruby
PIPELINE_ATTRIBUTES_KEYS = %i[pipeline_id commit_sha].freeze
def attributes_changed?(new_attributes)
uuid = new_attributes[:uuid]
existing_occurrence = existing_occurrences_by_uuid[uuid]
return true unless existing_occurrence
compared_attributes = new_attributes.keys - PIPELINE_ATTRIBUTES_KEYS # Excludes pipeline_id!
stable_new_attributes = new_attributes.deep_symbolize_keys.slice(*compared_attributes)
stable_existing_attributes = existing_occurrence.attributes.deep_symbolize_keys.slice(*compared_attributes)
stable_new_attributes != stable_existing_attributes
end
```
### Impact on GlobalAdvisoryScanWorker
When `PackageMetadata::GlobalAdvisoryScanWorker` processes advisories:
1. It retrieves SBOM occurrences via `Sbom::PossiblyAffectedOccurrencesFinder`
2. For each occurrence, it creates a `PossiblyAffectedComponent` using `sbom_occurrence.pipeline` (line 20 in `possibly_affected_component.rb`)
3. This pipeline is then used to determine the tracked context via `tracked_context(affected_component.pipeline)` (line 102 in `advisory_scanner.rb`)
**The problem:** The pipeline associated with the occurrence may be:
- From a different branch than the current default branch
- From a deleted/old pipeline
- `nil` if the original pipeline was deleted
- Not representative of where the component currently exists
This causes two production errors:
1. **NoMethodError** (#582958): When the original pipeline has been deleted, `pipeline` is `nil`, causing `pipeline.id` to fail
2. **ArgumentError** (#582960): When the original pipeline was on a non-default branch that doesn't have a tracked context
## Root Cause Analysis
The design decision to exclude `pipeline_id` from updates appears intentional (to avoid constant updates), but it creates a fundamental mismatch:
- **SBOM ingestion** treats occurrences as stable entities identified by `uuid` (based on component + version + source + project)
- **Advisory scanning** assumes the `pipeline` reference represents current/accurate context for vulnerability tracking
## Proposed Solutions
### Option 1: Update pipeline_id on re-ingestion (Recommended)
Always update `pipeline_id` and `commit_sha` when re-ingesting occurrences, removing them from `PIPELINE_ATTRIBUTES_KEYS` exclusion.
**Pros:**
- Ensures pipeline reference is always current
- Fixes both production errors
- Aligns with "Vulnerabilities Across Multiple Branches" initiative
**Cons:**
- May increase database writes
- Need to assess performance impact
### Option 2: Use a different pipeline reference for advisory scanning
Modify `GlobalAdvisoryScanWorker` to determine the appropriate pipeline/context differently, rather than relying on `sbom_occurrence.pipeline`.
**Pros:**
- Doesn't change SBOM ingestion behavior
- Could be more accurate for vulnerability context
**Cons:**
- More complex implementation
- Need to determine what the "correct" pipeline should be
### Option 3: Track multiple pipeline references
Add a mechanism to track all pipelines where an occurrence has been detected, not just the first one.
**Pros:**
- Most complete solution
- Enables tracking vulnerabilities across all branches
**Cons:**
- Significant schema and logic changes
- Higher complexity
## Questions to Resolve
1. **Why are pipeline attributes excluded from updates?** Is there a performance or data consistency reason?
2. **What is the intended behavior?** Should occurrences track the latest pipeline or the first pipeline?
3. **How does this align with "Vulnerabilities Across Multiple Branches"?** Should we be tracking occurrences per branch/context?
4. **Performance impact:** What is the cost of updating pipeline_id on every ingestion?
## Related Issues
- #582958 - NoMethodError when pipeline is nil
- #582960 - ArgumentError for non-default branch contexts
- Parent Epic: #18375 (Vulnerabilities Across Multiple Branches Iteration 1 - System Changes)
## Additional Context
This issue is blocking the rollout of the `set_tracked_context_during_ingestion` feature flag and affecting production stability of advisory scanning.
issue