Fix migrate! method (Minimal fix with ExclusiveLock to prevent race conditions)
What does this MR do?
It turned out @ayufan's summary https://gitlab.com/gitlab-com/infrastructure/issues/3674#note_58865445 was wrong. This problem was not related to
persist_object_store! error handling.
What actually happened was a race condition that
migrate!(LOCAL) were executed concurrently.
I was able to reproduce this problem with this script, and here is the result that I lost 47% of data when I tested on GDK.
> bundle exec rake gitlab:data_loss:simulate I, [2018-02-20T22:00:45.289744 #12256] INFO -- : Simulating... I, [2018-02-20T22:00:45.422242 #12256] INFO -- : last_job_id: 1714 I, [2018-02-20T22:00:53.835484 #12256] INFO -- : Sample: 100. Loss rate: 47.0
We're preparing a quick fix for this problem. This fix prevents the concurrent access to
After I patched this, this data loss problem was resolved.
> bundle exec rake gitlab:data_loss:simulate I, [2018-02-21T20:16:11.520435 #7992] INFO -- : Simulating... I, [2018-02-21T20:16:11.717088 #7992] INFO -- : last_job_id: 2416 I, [2018-02-21T20:17:00.001379 #7992] INFO -- : Sample: 100. Loss rate: 0.0
We're merging this fix asap.
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
- [-] Changelog entry added, if necessary
- [-] Documentation created/updated
- [-] API support added
- Tests added for this feature/bug
- [-] Has been reviewed by UX
- [-] Has been reviewed by Frontend
- Has been reviewed by Backend
- [-] Has been reviewed by Database
- Conform by the merge request performance guides
- Conform by the style guides
- Squashed related commits together
- Internationalization required/considered
- If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan
End-to-end tests pass (
package-qamanual pipeline job)