Replication lag message race on push over HTTP
Problem to solve
Follow up for !15901 (comment 216406855):
There is a race condition for simultaneous HTTP pushes (for a particular actor identifier to a particular repo) to a Geo-enabled installation.
- Simultaneous push to different secondaries
- Simultaneous push to same secondary
- Simultaneous push to secondary and push to primary
One push can see a replication lag message intended for the other push, and the other sees no lag message.
We accepted this race since most users would not make simultaneous pushes to the same repo, but as a more likely example, they can use a deploy key for programmatic pushes. Those pushes could receive erroneous messages.
Intended users
- Anyone who Git pushes
Further details
Proposal keeping the cache approach
We should be able to do something like this:
- Workhorse adds a correlation ID to POST
/-/geo-node-referrer/2/foo/bar.git/git-receive-packGitHttpController#git_receive_packstores the Geo node ID into Redis by correlation ID- Gitaly adds correlation ID to the
/api/v4/internal/post_receiverequest- Rails
/api/v4/internal/post_receivefetches key by correlation ID
Proposal getting rid of the cache approach
This is what I was trying to avoid by caching the geo node ID. But it's worth mentioning here since the above is non-trivial anyway.
- Gitaly: Accept a new
geo_node_idparam inReceivePackRPC - Workhorse: Call
ReceivePackRPC withgeo_node_idif we have it available - Rails: Add a
geo_node_idparam to the Workhorse OK response fromGitHttpController#git_receive_pack - Gitaly: When calling post-receive hook, make sure
geo_node_idis available - Gitaly: When post-receive hook calls internal API
post_receive, add thegeo_node_idparam to the request - Rails: In internal API
post_receive, usegeo_node_idto build the replication lag message
Permissions and Security
Documentation
Testing
What does success look like, and how can we measure that?
What is the type of buyer?
Links / references
Edited by Michael Kozono