Reorder enrichment batch fetch by RootNamespaceId and UniqueInstanceId for better clustering

Problem

The enrichment coordinator fetches batches of raw events ordered by IngestionTimestamp ASC, EventId ASC (raw_events.rb:fetch_batch_for_enrichment). This scatters events for the same namespace/instance across multiple batches, causing:

  • Redundant SubscriptionsAndTrialsFinder.for_namespaces_mapped calls across batches for the same namespace
  • Redundant Consumer.find_or_create_by! DB transactions for the same consumer tuple
  • Redundant SelfManaged::SubscriptionsAndTrialsFinder.for_instances calls for the same SM instance

This is the same class of problem solved for consumption in https://gitlab.com/gitlab-org/customers-gitlab-com/-/issues/15587#note_3089662513, where reordering by ConsumerIdsKey reduced ConsumptionService calls by 36%.

Proposed Fix

Change the enrichment batch ordering from:

ORDER BY r.IngestionTimestamp ASC, r.EventId ASC

To:

ORDER BY r.RootNamespaceId ASC, r.UniqueInstanceId ASC, r.IngestionTimestamp ASC, r.EventId ASC

This clusters:

  • SaaS events by RootNamespaceId — events for the same namespace land in the same batch
  • SM events (where RootNamespaceId = 0) together, then by UniqueInstanceId — events for the same instance land in the same batch

Keyset pagination is updated to use all 4 cursor fields: (RootNamespaceId, UniqueInstanceId, IngestionTimestamp, EventId).

Files Changed

  • app/models/billing/usage/raw_events.rb — Query ordering + keyset pagination + SELECT fields
  • app/jobs/billing/usage/enrichment_coordinator_job.rb — Track new cursor fields
  • lib/tasks/data_maintenance/billing/reprocess_unenriched_events.rake — Same cursor updates
  • Corresponding specs updated

Expected Impact

  • Fewer redundant subscription/trial lookups across batches
  • Fewer redundant Consumer.find_or_create_by! DB transactions
  • Comparable ClickHouse query time (ordering by indexed/low-cardinality columns)
  • No change to enrichment correctness — events are still processed individually

Branch

optimize_enrichment_query

Assignee Loading
Time tracking Loading