What does this MR do?
The goal here is to maximize throughput of batched migrations in terms of the number of tuples updated per time unit.
This is based on what we call "time efficiency". For a single job, time efficiency is the ratio of total duration to interval. Ideally, this is close to but smaller than 1.
We use exponential smoothing (EMA) to look at the last 10 jobs and their time efficiency. With EMA, we tolerate a spiking job better than by just looking at the most recent job.
We also define upper and lower bounds for the batch size, as to not "over-do" it in any direction.
This reflects what we've been doing manually for a while - looking at recent job durations and adjusting batch_size up or down. It is considered safe: If the batch size is too high, the job will take longer but continue to break this down into queries of equal size (we have
sub_batch_size for that). When it takes slightly longer than the interval, the next cronjob round won't pick up a new job and we pause for a while until the next round.