Skip to content

Geo: Container repository update events race condition

While working on #233514 (closed), I discovered that https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/gitlab/geo/log_cursor/events/container_repository_updated_event.rb#L11 is a no-op. The bang is in the transition method name, so I'm not sure it even does what the bang does for https://www.rubydoc.info/github/state-machines/state_machines-activerecord/StateMachines/Integrations/ActiveRecord. In any case, the bang does not indicate a save even though it does for ProjectRegistry, since that is a method which calls update!.

I chose not to fix this bug in the same MR as #233514 (closed) since it is a lower priority bug, and because if fixing it breaks something, I don't want a revert to rollback the other fix.

The intent of transitioning the state to "pending" before enqueuing the sync job is to avoid an uncommon (as far as we know) race condition in which a sync job exits due to the exclusive lease being taken, then the registry becomes marked synced but the resource is actually out-of-date.

  • Save the registry, before enqueuing the sync job
  • Add proper test coverage

I suggest an ~S2 (because something can appear synced but not be) ~P3 (because it is relatively uncommon and often is fixed by the subsequent update, or fixed by verification).

Edited by Michael Kozono