GitLab Migration - Automatically cancel migrations that are taking too long or got stale
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
We should stop migrations that are taking too long or are obviously in a stale/infinity loop state. See discussion we had internally:
João Cunha (he/him) 11:11 PM
`@George Koltsov` `@Allen Cook`, a thought that occurred me while I’m still trying to grasp everything, do you think it’s worth adding a limit to the BulkImportWorker#re_enqueue ?
I’m thinking about an unexpected scenario, for instance if we get stuck in BulkImportWorker#max_batch_size_exceeded? for too long. I don’t actually know if this is even possible. But I’m just thinking if it would make sense to put a limit, even if very high, just so we’re sure we’re never getting into a situation where we’re re_enqueueing this job endlessly. Then once this limit is reached we fail the operation. Or maybe there should be a way for the user themselves to request an operation to stop if it’s taking too long. Similarly like we have for GitLab Runners, where we can ask them to stop, or they will even give up after a very long time. I think For Runners it’s something like 2 hours… not sure.
George Koltsov:bulb: 7 hours ago
Yes I agree it's a good idea. We need some sort of a mechanism that aborts the whole process across all of the workers
George Koltsov:bulb: 7 hours ago
Right now it could get stuck, if lets say there is an entity in starte 'started' that caught unhandled exception and its state never got updated. Or if a process got hardkilled
Proposals
- Introduce a Max number of times the
BulkImportWorker
can be re-enqueued. - Instead of the number of re-enqueues, we could use a timeframe.
- Once a limit is reached, we should cancel and fail all the operations and its children operations.
We should clarify the strategy on canceling the three of operations:
- Is it just marking them as failed?
- Do we need to forcedly clean up sidekiq queues?
- How the user will get feedback about what happened?
Edited by 🤖 GitLab Bot 🤖