Skip to content

WIP: Geo POC for proxying non-get requests to the primary

Michael Kozono requested to merge mk/poc-use-primary-for-non-get-requests into master

What does this MR do?

Proxies POST, PATCH, PUT, and DELETE requests from Geo secondaries to the primary, in Rack middleware.

Related issue: https://gitlab.com/gitlab-org/gitlab-ee/issues/12582

Known problems

  • Disables CSRF protection so the primary does not reject these requests
  • Replication lag is magnified quite a lot by this functionality
    • E.g. Create issue -> proxied request is served by primary -> your browser requests the newly created issue on the secondary -> wait for DB replication -> then the request returns
    • This is even worse when non-DB resources need to be synced. On project creation with README file, the created project has no repo on the secondary for a minute, then sometimes an empty repo appears, then another minute later the file appears.
  • Flash messages don't work at all across Geo nodes
  • Project uploads don't work yet (/uploads/authorize raises an error I haven't looked at yet)
  • The current implementation of replacing HTTP_HOST header and modifying LOCATION header feels brittle
  • Content-length header is removed

Demo

Note that I'm currently dealing with a slow GDK I haven't investigated yet, so even without this change, every request takes a minimum of a second or two. Regardless it does help to show that this change magnifies the bad experience when there is any lag at all.

Geo_POC_proxying_non-get_requests_to_primary

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ee/issues/3764

Does this MR meet the acceptance criteria?

N/A, not intended for merge.

Takeaways

I think the lag problem will be the hardest problem to solve. There are customers replicating across the planet through a tiny, packet dropping pipe. In those cases, it may be technically impossible to support this kind of experience (secondaries as completely transparent proxies).

New idea

On any non-GET request, proxy it to the primary and allow the response location to change to the primary. => https://gitlab.com/gitlab-org/gitlab-ee/issues/10779

This works-around the lag problem, because the user's following requests hit the primary, where the state change is guaranteed to exist. We could flash on the primary Your write action has been redirected to the primary. Click here to go back to the secondary.. Other flash messages will work as-is.

Edited by Toon Claes

Merge request reports