Skip to content

Preload DB records in bulk ES indexing

Dylan Griffith requested to merge 207280-batch-load-db-bulk-indexer-records into master

What does this MR do?

The ProcessBookkeepingService is responsible for synchronizing all database updates with Elasticsearch. It does this by batching updates in groups of 1000 from a custom redis Queue.

Today it loops through each one calling bulk_indexer.process which is eventually calling #database_record for each element. We are intentionally sending them through #process one at a time because sometimes we want to send them to Elasticsearch in groups smaller than 1000 to ensure we don't send a single request that is too large for Elasticsearch to handle.

As such it isn't really feasible to unwind all the code and just pass arrays all the way through the system. Thus in order to avoid the N query problem we can do a similar trick to rails preloading by implementing our own preloader on a collection of documents.

This MR implements this by creating a new DocumentReference::Collection class with a #preload_database_records method that goes ahead and updates each contained DocumentReference with their corresponding database_record so that later when the method is invoked it will be memoized.

Since the ProcessBookkeepingService can handle multiple different types of active records we need to group them by type before performing a single DB query for each type as loading different records from different tables in 1 query is more convoluted.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #207280 (closed)

Edited by 🤖 GitLab Bot 🤖

Merge request reports