Skip to content

Prevent repository storage move workers from running simultaneously

James Fargher requested to merge repo_moves_until_executed into master

What does this MR do and why?

Part of #385309 (closed)

I have not managed to reproduce this bug but we can deduce that multiple workers are running from the error:

Cannot transition state via :finish_replication from :replicated (Reason(s): State cannot transition via "finish replication")

From this you can see we're running the event finish_replication. This event is only called from one place - in the UpdateRepositoryStorageService.

This event is failing because the current state is replicated. We cannot transition from replicated to replicated.

That means at the time the model reloaded at the start of the transaction block the state must be replicated. Except that we know that this model must have been in the state started because at the start of the service we lock the model and make the started transition.

This leads me to believe that the database locking is insufficient and that the state of the repository storage move is being modified out-of-processes (i.e. there are multiple workers). I believe this could be fixed using database locking alone, but would require significant refactor work. So instead here we tell sidekiq to dedup until the entire move has completed.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by James Fargher

Merge request reports