Batched background migration worker doesn't handle statement timeout while finding batch boundaries
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
If a batched background migration encounters a statement timeout in the batching strategy while finding a batch boundary, it fails to reduce the batch size and instead the exception fails the sidekiq job.
This causes the sidekiq job to fail in a loop, as the next worker picks the same batch again.
Current workaround: Find the job id and pause it with chatops - /chatops run batched_background_migrations pause <id> --database <main/ci>
Instead we should reduce the batch size of the migration when it fails with a statement timeout, so that future iterations have less work to do and are likely to succeed.
Edited by 🤖 GitLab Bot 🤖