Absolute URLs are produced in Rails, using external_url. Many of these items fundamentally cannot be replaced with a relative URL. These absolute URLs are used to:
Display the path of image URLs in package registry
Produce links in search results
Produce URLs in GitLab API responses
and more
If a secondary site has a different external_url than the primary site, then a user of the secondary site will sometimes land on the primary site. This is confusing, and loses the benefit of any "acceleration" by the secondary site.
Proposal
Absolute URLs rendered in the UI or by the API should use the same Host that the originating request uses.
SSH port is customizable per site. Secondary sites with separate URLs may need to send that information to the primary so the primary can properly render SSH clone URLs. See !111328 (merged)
Open questions
What to do about URLs in emails that are generated in the background? There is no "remote Host" in that context.
Even when there is a "remote Host" during email generation, one user may cause notifications to be generated for other users. Each user may use a different site. I propose we ignore this problem for now.
Michael Kozonochanged title from Geo secondary proxying: How should we handle the use of absolute URLs in the app? to Geo secondary proxying with different URLs: How should we handle the use of absolute URLs in the app?
changed title from Geo secondary proxying: How should we handle the use of absolute URLs in the app? to Geo secondary proxying with different URLs: How should we handle the use of absolute URLs in the app?
Michael Kozonochanged the descriptionCompare with previous version
Good question. @cat and I were handling the "different URL" use-case as a timeboxed investigation before demoting it to Backlog since there were still a few threads to pull-- and it appears that Cat may have found a way to support the overall use-case as part of implementing #340086 (closed), without rewriting Host like #339260 (closed)! So let's keep this around for now, and we may be able to just close it soon.
Actually since this issue specifically asks "what should we do if we rewrite Host", and since we plan to not rewrite Host, we can close this one already.
@mkozono - hmm, I think we have #341357; that might almost be the same thing, although that's an explicit idea to get rid of absolute URLs entirely, where possible (the clone URLs and other things won't be possible [but there we could use something like the extra data we send in !82697 (merged)], for example, but redirects etc).
I think it makes sense to keep both open, and keep them separate, i.e. the other issue being about redirects and other URLs we can get rid of, and this one for ones that we actually need to be absolute but should use the secondary URL if different; WDYT?
Michael Kozonochanged title from Geo secondary proxying with different URLs: How should we handle the use of absolute URLs in the app? to Geo: Use the remote Host in Absolute URLs generated by the primary during proxied requests
changed title from Geo secondary proxying with different URLs: How should we handle the use of absolute URLs in the app? to Geo: Use the remote Host in Absolute URLs generated by the primary during proxied requests
Michael Kozonochanged the descriptionCompare with previous version
We want Gitlab.config.gitlab.url and related methods host, port, maybe protocol?, ssh_host, ssh_port, etc to output values for the secondary site when the "current request" is proxied by a Geo site. We may be able to use a middleware which modifies these configs per request. Here is an initial attempt at that. It didn't work as-is to modify the clone URLs. For the HTTP clone URL, I think it is because the routes are already loaded, and the host in URLs is specified in an initializer. For the SSH clone URL, the primary might assume the ssh host is the same, but the primary can't know the port. The secondary could send that info to the primary in the proxied request extra data. Also, the SettingsLogic data may be cached somehow.
Unfortunately there are more moving parts than I'd hoped. Let's keep this weighted 10 weight the long term fix a 10 since I only did a small spike and didn't follow up each thought above. So it looks like it's a good idea to do the short term fix first.
I didn't use "remote Host". Instead, I used Gitlab::Geo.proxied_site.uri. But the obstacles after that are the same. I'll weight this a 10.
I think this is ready for workflowscheduling. It is triaged and weighted already. The workaround is that some requests will not be accelerated like they could be, but everything is still functional.
Contributions like this are vital to help make GitLab a better product.
We would be grateful for your help in verifying whether your bug report requires further attention from the team. If you think this bug still exists, and is reproducible with the latest stable version of GitLab, please comment on this issue.
This issue has been inactive for more than 12 months now and based on the policy for inactive bugs, will be closed in 7 days.
Thanks for your contributions to make GitLab better!
This issue's description does not seem to have a section for "Implementation Guide".
Please consider adding one, because it makes a big difference for contributors.
This section can be brief but must have clear technical guidance, like:
Hints on lines of code which may need changing
Hints on similar code/patterns that can be leveraged
Suggestions for test coverage
Ideas for breaking up the merge requests into iterative chunks
Links to documentation (within GitLab or external) about implementation or testing guidelines, especially when working with third-party libraries