Skip to content

Batch load records from the database for elasticsearch incremental bulk updates

The following discussion from !24298 (merged) should be addressed:

  • @nick.thomas started a discussion:

    I don't think we need to do this before merge - this is the behaviour of the status quo, the only difference is we now have all these SQL queries happening in a single sidekiq worker job, rather than one per job.

    Optimising this will be awesome, but it's best handled as a follow-up issue I think.

The Elastic::ProcessBookkeepingService introduced (or rather, being introduced) in !24298 (merged) opens the door for a new optimisation.

When we're building a bulk update request for 1,000 notes, for example, that's 1,000 SELECT * from notes WHERE id = ? SQL statements at the moment. We should be able to reduce that to one SELECT * from notes where id IN(...) statement instead, using techniques familiar to us from graphql.