Prevent repository storage move workers from running simultaneously
What does this MR do and why?
Part of #385309 (closed)
I have not managed to reproduce this bug but we can deduce that multiple workers are running from the error:
Cannot transition state via :finish_replication from :replicated (Reason(s): State cannot transition via "finish replication")
From this you can see we're running the event finish_replication
. This event is only called from one place - in the UpdateRepositoryStorageService
.
This event is failing because the current state is replicated
. We cannot transition from replicated
to replicated
.
That means at the time the model reloaded at the start of the transaction block the state must be replicated
. Except that we know that this model must have been in the state started
because at the start of the service we lock the model and make the started transition.
This leads me to believe that the database locking is insufficient and that the state of the repository storage move is being modified out-of-processes (i.e. there are multiple workers). I believe this could be fixed using database locking alone, but would require significant refactor work. So instead here we tell sidekiq to dedup until the entire move has completed.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.