Skip to content

Fix migrate! method (Minimal fix with ExclusiveLock to prevent race conditions)

Shinya Maeda requested to merge fix/sm/atomic-migration into master

What does this MR do?

It turned out @ayufan's summary https://gitlab.com/gitlab-com/infrastructure/issues/3674#note_58865445 was wrong. This problem was not related to move_to_cache/move_to_store, nor persist_object_store! error handling.

What actually happened was a race condition that migrate!(REMOTE) and migrate!(LOCAL) were executed concurrently.

I was able to reproduce this problem with this script, and here is the result that I lost 47% of data when I tested on GDK.

 > bundle exec rake gitlab:data_loss:simulate
I, [2018-02-20T22:00:45.289744 #12256]  INFO -- : Simulating...
I, [2018-02-20T22:00:45.422242 #12256]  INFO -- : last_job_id: 1714
I, [2018-02-20T22:00:53.835484 #12256]  INFO -- : Sample: 100. Loss rate: 47.0

We're preparing a quick fix for this problem. This fix prevents the concurrent access to migrate! method.

After I patched this, this data loss problem was resolved.

 > bundle exec rake gitlab:data_loss:simulate
I, [2018-02-21T20:16:11.520435 #7992]  INFO -- : Simulating...
I, [2018-02-21T20:16:11.717088 #7992]  INFO -- : last_job_id: 2416
I, [2018-02-21T20:17:00.001379 #7992]  INFO -- : Sample: 100. Loss rate: 0.0

We're merging this fix asap.

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

  • [-] Changelog entry added, if necessary
  • [-] Documentation created/updated
  • [-] API support added
  • Tests added for this feature/bug
  • Review
    • [-] Has been reviewed by UX
    • [-] Has been reviewed by Frontend
    • Has been reviewed by Backend
    • [-] Has been reviewed by Database
  • Conform by the merge request performance guides
  • Conform by the style guides
  • Squashed related commits together
  • Internationalization required/considered
  • If paid feature, have we considered GitLab.com plan and how it works for groups and is there a design for promoting it to users who aren't on the correct plan
  • End-to-end tests pass (package-qa manual pipeline job)

What are the relevant issue numbers?

Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/4928

Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/4980

Edited by Kamil Trzciński

Merge request reports