Add `maps_with` method to BulkInsertableTask

Summary

As described in #434911 (closed), the BulkInsertableTask class prunes non-unique data before ingestion. Doing so has a positive effect on performance, and offloads some no-op updates from Postgres onto Rails. As a result, a pattern has emerged where we manually group the returned data by unique attributes, and then use that data when updating attributes in after_ingest. Unfortunately, this has become a more common pattern, and forgetting to implement it has lead to silent failing, hard to debug issues.

Improvements

Implementing this in individual tasks leads to redundant code and does not scale. Having a maps_with method in the module makes it possible for tasks to focus on setting the attribute values, and alleviates the developer from having to remember about this edge case.

Risks

  • Increased CPU and memory usage from grouping attributes into a hash map.
    • The tasks run as a background job, so the minimal increases in CPU and memory usage should not affect user facing features in a visible manner.

Involved components

Gitlab::Ingestion::BulkInsertableTask and classes that include this module.

One way to find this is via grep (ripgrep in this example):

➜ rg -l 'include Gitlab::Ingestion::BulkInsertableTask'
ee/app/services/package_metadata/ingestion/tasks/ingest_packages.rb
ee/app/services/package_metadata/ingestion/tasks/base.rb
ee/app/services/sbom/ingestion/tasks/ingest_sources.rb
ee/app/services/sbom/ingestion/tasks/ingest_occurrences.rb
ee/app/services/sbom/ingestion/tasks/ingest_components.rb
ee/app/services/sbom/ingestion/tasks/ingest_component_versions.rb
ee/app/services/sbom/ingestion/tasks/ingest_occurrences_vulnerabilities.rb
ee/app/services/security/ingestion/tasks/ingest_finding_pipelines.rb
ee/app/services/security/ingestion/tasks/ingest_finding_identifiers.rb
ee/app/services/security/ingestion/tasks/ingest_finding_evidence.rb
ee/app/services/security/ingestion/tasks/ingest_finding_signatures.rb
ee/app/services/security/ingestion/tasks/ingest_vulnerability_flags.rb
ee/app/services/security/ingestion/tasks/ingest_vulnerabilities/create.rb
ee/app/services/security/ingestion/tasks/ingest_finding_links.rb
ee/app/services/security/ingestion/tasks/ingest_identifiers.rb
ee/app/services/security/ingestion/tasks/ingest_remediations.rb
ee/app/services/security/ingestion/tasks/ingest_findings.rb
ee/spec/lib/gitlab/ingestion/bulk_insertable_task_spec.rb

Optional: Intended side effects

  • Reduced error rates in transactions where duplicate maps exist.
  • Reduced cognitive load on developers since they won't have to always defend against a known issue.

Optional: Missing test coverage

Edited by Oscar Tovar