Abstract sub-batching boilerplate for batched background migrations
The sub-batching logic used in batched background migrations jobs requires a significant amount of boilerplate to implement. It's tedious and error-prone to have each job copy/paste this same logic, with only slight tweaks. Looking at the existing migrations, we see most follow the same pattern from the original CopyColumnUsingBackgroundMigrationJob
, as simplified below:
def perform(start_id, end_id, batch_table, batch_column, sub_batch_size, pause_ms)
relation_scoped_to_range(batch_table, batch_column, start_id, end_id).each_batch(column: batch_column, of: sub_batch_size) do |batch|
batch_metrics.instrument_operation(:update_all) do
sub_batch.update_all('foo = bar')
end
sleep(pause_ms * 0.001)
end
end
The only job-specific logic is the DML performed on the sub_batch
. Further, the instrumentation is required to be included for the batch optimizer to work, but can be easily overlooked.
We should abstract the surrounding batch handling into the framework, without restricting the ability to write complex jobs which may not follow this model.