Add `maps_with` method to BulkInsertableTask
Summary
As described in #434911 (closed), the BulkInsertableTask
class prunes non-unique data before ingestion. Doing so has a positive effect on performance,
and offloads some no-op updates from Postgres onto Rails. As a result, a pattern has emerged
where we manually group the returned data by unique attributes, and then use that data when
updating attributes in after_ingest. Unfortunately, this has become a more common pattern,
and forgetting to implement it has lead to silent failing, hard to debug issues.
Improvements
Implementing this in individual tasks leads to redundant code and does not scale.
Having a maps_with method in the module makes it possible for tasks to focus on
setting the attribute values, and alleviates the developer from having to remember about
this edge case.
Risks
- Increased CPU and memory usage from grouping attributes into a hash map.
- The tasks run as a background job, so the minimal increases in CPU and memory usage should not affect user facing features in a visible manner.
Involved components
Gitlab::Ingestion::BulkInsertableTask and classes that include this module.
One way to find this is via grep (ripgrep in this example):
➜ rg -l 'include Gitlab::Ingestion::BulkInsertableTask'
ee/app/services/package_metadata/ingestion/tasks/ingest_packages.rb
ee/app/services/package_metadata/ingestion/tasks/base.rb
ee/app/services/sbom/ingestion/tasks/ingest_sources.rb
ee/app/services/sbom/ingestion/tasks/ingest_occurrences.rb
ee/app/services/sbom/ingestion/tasks/ingest_components.rb
ee/app/services/sbom/ingestion/tasks/ingest_component_versions.rb
ee/app/services/sbom/ingestion/tasks/ingest_occurrences_vulnerabilities.rb
ee/app/services/security/ingestion/tasks/ingest_finding_pipelines.rb
ee/app/services/security/ingestion/tasks/ingest_finding_identifiers.rb
ee/app/services/security/ingestion/tasks/ingest_finding_evidence.rb
ee/app/services/security/ingestion/tasks/ingest_finding_signatures.rb
ee/app/services/security/ingestion/tasks/ingest_vulnerability_flags.rb
ee/app/services/security/ingestion/tasks/ingest_vulnerabilities/create.rb
ee/app/services/security/ingestion/tasks/ingest_finding_links.rb
ee/app/services/security/ingestion/tasks/ingest_identifiers.rb
ee/app/services/security/ingestion/tasks/ingest_remediations.rb
ee/app/services/security/ingestion/tasks/ingest_findings.rb
ee/spec/lib/gitlab/ingestion/bulk_insertable_task_spec.rb
Optional: Intended side effects
- Reduced error rates in transactions where duplicate maps exist.
- Reduced cognitive load on developers since they won't have to always defend against a known issue.