Replication lag message race on push over HTTP

Problem to solve

Follow up for !15901 (comment 216406855):

There is a race condition for simultaneous HTTP pushes (for a particular actor identifier to a particular repo) to a Geo-enabled installation.

  • Simultaneous push to different secondaries
  • Simultaneous push to same secondary
  • Simultaneous push to secondary and push to primary

One push can see a replication lag message intended for the other push, and the other sees no lag message.

We accepted this race since most users would not make simultaneous pushes to the same repo, but as a more likely example, they can use a deploy key for programmatic pushes. Those pushes could receive erroneous messages.

Intended users

  • Anyone who Git pushes

Further details

Proposal keeping the cache approach

We should be able to do something like this:

  1. Workhorse adds a correlation ID to POST /-/geo-node-referrer/2/foo/bar.git/git-receive-pack
  2. GitHttpController#git_receive_pack stores the Geo node ID into Redis by correlation ID
  3. Gitaly adds correlation ID to the /api/v4/internal/post_receive request
  4. Rails /api/v4/internal/post_receive fetches key by correlation ID

Proposal getting rid of the cache approach

This is what I was trying to avoid by caching the geo node ID. But it's worth mentioning here since the above is non-trivial anyway.

  • Gitaly: Accept a new geo_node_id param in ReceivePack RPC
  • Workhorse: Call ReceivePack RPC with geo_node_id if we have it available
  • Rails: Add a geo_node_id param to the Workhorse OK response from GitHttpController#git_receive_pack
  • Gitaly: When calling post-receive hook, make sure geo_node_id is available
  • Gitaly: When post-receive hook calls internal API post_receive, add the geo_node_id param to the request
  • Rails: In internal API post_receive, use geo_node_id to build the replication lag message

Permissions and Security

Documentation

Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Links / references

Edited Sep 13, 2019 by Michael Kozono
Assignee Loading
Time tracking Loading