Skip to content

Draft: Geo: Unified GitLab URL [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Michael Kozono requested to merge mk/proxy-to-primary into master

What does this MR do?

Currently, Geo secondary sites are read-only. They block nearly all non-GET requests, making the web UI and API not very useful. If Geo secondaries behaved exactly like the Geo primary, then all Geo sites could be placed behind a location-aware URL, thus making the user experience transparent, with no knowledge needed about multiple sites.

This MR proxies all HTTP requests from Geo secondary sites to the Geo primary site by default, which makes the web UI and API usable. Then, we allowlist certain classes of requests to let the secondary handle them locally. Users near the secondary (and far from the primary) will have an "accelerated" experience for locally handled requests.

  • Git requests are allowlisted. They are already handled properly by secondaries (including proxying pushes to the primary).
  • Rails assets are served by workhorse directly.

Behind development feature flag: geo_secondary_mimicry

Part of #207168 (closed)

Why not proxy only non-GET requests?

A POC !10309 (closed) proved unworkable. Meanwhile, this MR does not need to deal with CSRF problems at all, nor race conditions with the GET requests which frequently immediately follow non-GET requests.

Caveats

  • Users still need to have access to a Geo primary-specific URL so their browser can complete authentication via OAuth.
  • Geo secondary sites still need to be accessible to admins, since some Geo-specific views and actions are specific to each secondary.
  • With this architecture, performance of proxied requests will typically be slower than interacting directly with the primary, simply due to additional overhead per request. There are special cases that can mitigate this latency, though. For example, the two Geo sites may have a dedicated connection, or the secondary may be located directly between the user's office and the primary. Regardless, this additional latency per request may be worth it since the secondary will handle Git requests and various large file transfers locally.

TODO

  • Make Git request detection more specific than just a path include check
  • Add secondary-specific admin actions to allowlist (esp. /admin/geo, check maintenance mode allowlists)
  • Look into rack-proxy's streaming option
  • Look into implementing this proxy behavior in Workhorse

Potentially big problem, not sure yet

  • Controllers which include WorkhorseRequest or include WorkhorseAuthorization do not work. These might be difficult to fix.
    • Attaching files to descriptions/comments
    • Importing GitLab groups and projects
    • Importing requirements CSV

I'm guessing this impacts a lot of things in workhorse routes.

Maybe we can modify Workhorse to ask Rails internally if it is a Geo secondary (can we cache that in Workhorse?) and if so, then Workhorse must proxy those to the primary. If that is not crazy, then perhaps we should implement this proxy behavior entirely in Workhorse instead of in middleware. It turns out that the Workhorse repo is being absorbed into the main gitlab codebase, so this should be easier soon (or maybe it is already).

If it will be a huge effort to fix these, then see if Object Storage + direct_upload is a viable workaround. If so, then call it a requirement of the secondary mimicry feature for now. And then we can add support for uploads to block storage as a later iteration.

Note that Git HTTP and LFS are also Workhorse requests, but fortunately Geo secondaries already have special handling for those, so we can just exclude them here. We might look at how those requests work and see if we can apply something similar to the above problems.

Future

In later iterations, we can progressively accelerate requests for specific routes by skipping proxying if the resource:

  • is immutable, and exists
  • is not sensitive to out-of-dateness, and exists
  • is time-sensitive, but is confirmed up-to-date by asking the primary if the local updated_at is current

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Notes

Former TODO

  • Secondary sign out flow not working. The OAuth logout state is somehow nil

If you are already signed in to the secondary before switching to this branch and restarting the rails web server, then on sign out you may get errors. The errors look different on an Omnibus Geo deployment. ATM I think this happens on master whenever you restart the secondary rails web server. Not sure.

Former potentially big problem

  • Some requests are much slower when proxied through secondary
    • If you even edit a code comment and write the middleware file, and then make a request to the primary, then GDK's rails-web will hang (I assume reloading). I think this is what I observed earlier. 🤦
Edited by Michael Kozono

Merge request reports