Skip to content
Snippets Groups Projects
Verified Commit f3c64874 authored by Michael Kozono's avatar Michael Kozono :two: Committed by GitLab
Browse files

Clarify Geo proxying docs

parent a705ceb6
No related branches found
No related tags found
2 merge requests!164749Enable parallel in test-on-omnibus,!161373Clarify Geo proxying docs
......@@ -90,7 +90,7 @@ Note the following when promoting a secondary:
error message during this process, for more information, see this
[troubleshooting advice](failover_troubleshooting.md).md#fixing-errors-during-a-failover-or-when-promoting-a-secondary-to-a-primary-site).
- If you are using separate URLs, you should [point the primary domain DNS at the newly promoted site](#step-4-optional-updating-the-primary-domain-dns-record). Otherwise, runners must be registered again with the newly promoted site, and all Git remotes, bookmarks, and external integrations must be updated.
- If you are using a [location aware public URL](../secondary_proxy/location_aware_external_url.md), the runners should automatically connect to the new primary after the old primary is removed from the DNS entry.
- If you are using [location-aware DNS](../secondary_proxy/index.md#configure-location-aware-dns), the runners should automatically connect to the new primary after the old primary is removed from the DNS entry.
- If you don't expect the runners connected to the previous primary to come back, you should remove them:
- Through the UI:
1. On the left sidebar, at the bottom, select **Admin**.
......
......@@ -36,7 +36,7 @@ Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the read-only load between your **primary** and **secondary** sites.
- Balance the read load between your **primary** and **secondary** sites.
In addition, it:
......@@ -76,7 +76,7 @@ Keep in mind that:
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- The **primary** site doesn't talk to **secondary** sites to notify for changes (API).
- You can push directly to a **secondary** site (for both HTTP and SSH,
including Git LFS).
including Git LFS), and it will proxy the requests to the **primary** site.
- There are [limitations](#limitations) when using Geo.
### Architecture
......@@ -99,12 +99,16 @@ In this diagram:
From the perspective of a user performing Git operations:
- The **primary** site behaves as a full read-write GitLab instance.
- **Secondary** sites are read-only but proxy Git push operations to the **primary** site. This makes **secondary** sites appear to support push operations themselves.
- **Secondary** sites proxy web UI requests to the primary. This makes the **secondary** sites appear to support full UI read/write operations.
- **Secondary** sites behave as full read-write GitLab instances. **Secondary** sites transparently proxy all operations to the **primary** site, with [some notable exceptions](secondary_proxy/index.md#features-accelerated-by-secondary-geo-sites). In particular, Git fetches are served by the **secondary** site when it is up-to-date.
From the perspective of a user browsing the GitLab UI, or using the API:
- The **primary** site behaves as a full read-write GitLab instance.
- **Secondary** sites behave as full read-write GitLab instances. **Secondary** sites transparently proxy all operations to the **primary** site, with [some notable exceptions](secondary_proxy/index.md#features-accelerated-by-secondary-geo-sites). In particular, web UI assets are served by the **secondary** site.
To simplify the diagram, some necessary components are omitted.
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell).
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
A **secondary** site needs two different PostgreSQL databases:
......@@ -218,6 +222,13 @@ This list of limitations only reflects the latest version of GitLab. If you are
- Git clone and fetch requests with option `--depth` over SSH against a secondary site does not work and hangs indefinitely if the secondary site is not up to date at the time the request is initiated. For more information, see [issue 391980](https://gitlab.com/gitlab-org/gitlab/-/issues/391980).
- Git push with options over SSH against a secondary site does not work and terminates the connection. For more information, see [issue 417186](https://gitlab.com/gitlab-org/gitlab/-/issues/417186).
- The Geo secondary site does not accelerate (serve) the clone request for the first stage of the pipeline in most cases. Later stages are not guaranteed to be served by the secondary site either, for example if the Git change is large, bandwidth is small, or pipeline stages are short. In general, it does serve the clone request for subsequent stages. [Issue 446176](https://gitlab.com/gitlab-org/gitlab/-/issues/446176) discusses the reasons for this and proposes an enhancement to increase the chance that Runner clone requests are served from the secondary site.
- When a single Git repository receives pushes at a high-enough rate, the secondary site's local copy can be perpetually out-of-date. This causes all Git fetches of that repository to be forwarded to the primary site. See [GitLab issue #455870](https://gitlab.com/gitlab-org/gitlab/-/issues/455870).
- [Proxying](secondary_proxy/index.md) is implemented only in the GitLab application in the Puma service or Web service, so other services do not benefit from this behavior. You should use a [separate URL](secondary_proxy/index.md#set-up-a-separate-url-for-a-secondary-geo-site) to ensure requests are always sent to the primary. These services include:
- GitLab container registry - [can be configured to use a separate domain](../packages/container_registry.md#configure-container-registry-under-its-own-domain), such as `registry.example.com`. Secondary site container registries are intended only for disaster recovery. Users should not be routed to them, especially not for pushes, because the data is not propagated to the primary site.
- GitLab Pages - should always use a separate domain, as part of [the prerequisites for running GitLab Pages](../pages/index.md#prerequisites).
- With a [unified URL](secondary_proxy/index.md#set-up-a-unified-url-for-geo-sites), Let's Encrypt can't generate certificates unless it can reach both IPs through the same domain. To use TLS certificates with Let's Encrypt, you can manually point the domain to one of the Geo sites, generate the certificate, then copy it to all other sites.
- [Using Geo secondary sites to accelerate runners](secondary_proxy/runners.md) is experimental and is not recommended for production. Progress toward general availability can be tracked in [epic 9779](https://gitlab.com/groups/gitlab-org/-/epics/9779).
- When a [secondary site uses a separate URL](secondary_proxy/index.md#set-up-a-separate-url-for-a-secondary-geo-site) from the primary site, [signing in the secondary site using SAML](replication/single_sign_on.md#saml-with-separate-url-with-proxying-enabled) is only supported if the SAML Identity Provider (IdP) allows an application to be configured with multiple callback URLs.
### Limitations on replication/verification
......@@ -292,9 +303,9 @@ For information on using Geo in disaster recovery situations to mitigate data-lo
For more information on how to replicate the container registry, see [Container registry for a **secondary** site](replication/container_registry.md).
### Geo secondary proxy
### Set up a unified URL for Geo sites
For more information on using Geo proxying on secondary sites, see [Geo proxying for secondary sites](secondary_proxy/index.md).
For an example of how to set up a single, location-aware URL with AWS Route53 or Google Cloud DNS, see [Set up a unified URL for Geo sites](secondary_proxy/index.md#set-up-a-unified-url-for-geo-sites).
### Single Sign On (SSO)
......@@ -312,10 +323,6 @@ For more information on Geo security, see [Geo security review](replication/secu
For more information on tuning Geo, see [Tuning Geo](replication/tuning.md).
### Set up a location-aware Git URL
For an example of how to set up a location-aware Git remote URL with AWS Route53, see [Location-aware Git remote URL with AWS Route53](replication/location_aware_git_url.md).
### Backfill
When a **secondary** site is set up, it starts replicating missing data from
......
......@@ -11,7 +11,7 @@ DETAILS:
**Offering:** Self-managed
NOTE:
[GitLab Geo supports a location-aware URL including web UI and API traffic.](../secondary_proxy/location_aware_external_url.md)
[GitLab Geo supports location-aware DNS including web UI and API traffic.](../secondary_proxy/index.md#configure-location-aware-dns)
This configuration is recommended over the location-aware Git remote URL
described in this document.
......
......@@ -25,8 +25,8 @@ You only configure SAML on the primary site. Configuring `gitlab_rails['omniauth
How you configure instance-wide SAML differs depending on your secondary site configuration. Determine if your secondary site uses a:
- [Unified URL](../secondary_proxy/index.md#set-up-a-unified-url-for-geo-sites), meaning the `external_url` exactly matches the `external_url` of the primary site.
- [Separate URL](../secondary_proxy/index.md#geo-proxying-with-separate-urls) with proxying enabled. Proxying is enabled by default in GitLab 15.1 and later.
- [Separate URL](../secondary_proxy/index.md#geo-proxying-with-separate-urls) with proxying disabled.
- [Separate URL](../secondary_proxy/index.md#set-up-a-separate-url-for-a-secondary-geo-site) with proxying enabled. Proxying is enabled by default in GitLab 15.1 and later.
- [Separate URL](../secondary_proxy/index.md#set-up-a-separate-url-for-a-secondary-geo-site) with proxying disabled.
### SAML with Unified URL
......
This diff is collapsed.
---
stage: Systems
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments
redirect_to: 'index.md#configure-location-aware-dns'
remove_date: '2024-11-01'
---
# Location-aware public URL
This document was moved to [another location](index.md#configure-location-aware-dns).
DETAILS:
**Tier:** Premium, Ultimate
**Offering:** Self-managed
With [Geo proxying for secondary sites](index.md), you can provide GitLab users
with a single URL that automatically uses the Geo site closest to them.
Users don't need to use different URLs or worry about read-only operations to take
advantage of closer Geo sites as they move.
With [Geo proxying for secondary sites](index.md) web and Git requests are proxied
from **secondary** sites to the **primary**.
## Prerequisites
This example creates a `gitlab.example.com` subdomain that automatically directs
requests:
- From Europe to a **secondary** site.
- From all other locations to the **primary** site.
The URLs to access each node by itself are:
- `primary.example.com` as a Geo **primary** site.
- `secondary.example.com` as a Geo **secondary** site.
For this example, you need:
- A working GitLab **primary** site that is accessible at `gitlab.example.com` _and_ `primary.example.com`.
- A working GitLab **secondary** site.
- A DNS zone managing your domain. Although the following instructions use
[AWS Route53](https://aws.amazon.com/route53/)
and [GCP cloud DNS](https://cloud.google.com/dns/), other services such as
[Cloudflare](https://www.cloudflare.com/) can be used as well.
If you haven't yet set up a Geo _primary_ site and _secondary_ site, see the
[Geo setup instructions](../index.md#setup-instructions).
## AWS Route53
In this example, you use a Route53 Hosted Zone managing your domain for the Route53 setup.
In a Route53 Hosted Zone, traffic policies can be used to set up a variety of
routing configurations. To create a traffic policy:
1. Go to the
[Route53 dashboard](https://console.aws.amazon.com/route53/home) and select
**Traffic policies**.
1. Select **Create traffic policy**.
1. Fill in the **Policy Name** field with `Single Git Host` and select **Next**.
1. Leave **DNS type** as `A: IP Address in IPv4 format`.
1. Select **Connect to**, then select **Geolocation rule**.
1. For the first **Location**:
1. Leave it as `Default`.
1. Select **Connect to**, then select **New endpoint**.
1. Choose **Type** `value` and fill it in with `<your **primary** IP address>`.
1. For the second **Location**:
1. Choose `Europe`.
1. Select **Connect to**, then select **New endpoint**.
1. Choose **Type** `value` and fill it in with `<your **secondary** IP address>`.
![Add traffic policy endpoints](img/single_url_add_traffic_policy_endpoints.png)
1. Select **Create traffic policy**.
1. Fill in **Policy record DNS name** with `gitlab`.
![Create policy records with traffic policy](img/single_url_create_policy_records_with_traffic_policy.png)
1. Select **Create policy records**.
You have successfully set up a single host, like `gitlab.example.com`, which
distributes traffic to your Geo sites by geolocation.
## GCP
In this example, you create a GCP Cloud DNS zone managing your domain.
When creating Geo-Based record sets, GCP applies a nearest match for the source region when the source of the traffic doesn't match any policy items exactly. To create a Geo-Based record set:
1. Select **Network Services** > **Cloud DNS**.
1. Select the Zone configured for your domain.
1. Select **Add Record Set**.
1. Enter the DNS Name for your Location-aware public URL, for example, `gitlab.example.com`.
1. Select the **Routing Policy**: **Geo-Based**.
1. Select **Add Managed RRData**.
1. Select **Source Region**: **us-central1**.
1. Enter your `<**primary** IP address>`.
1. Select **Done**.
1. Select **Add Managed RRData**.
1. Select **Source Region**: **europe-west1**.
1. Enter your `<**secondary** IP address>`.
1. Select **Done**.
1. Select **Create**.
You have successfully set up a single host, like `gitlab.example.com`, which
distributes traffic to your Geo sites using a location-aware URL.
## Enable Geo proxying for secondary sites
After setting up a single URL to use for all Geo sites, continue with the [steps to enable Geo proxying for secondary sites](index.md).
<!-- This redirect file can be deleted after 2024-11-01. -->
......@@ -20,7 +20,7 @@ The jobs that start during the first stage of a pipeline almost always have thei
## Use secondary runners with a Location Aware public URL (Unified URL)
Using a [Location Aware public URL](location_aware_external_url.md), with the feature flag enabled works with no extra configuration. After you install and register a runner in the same location as a secondary site, it automatically talks to the closest site, and only proxies to the primary if the secondary is out of date.
Using [Location-Aware DNS](index.md#configure-location-aware-dns), with the feature flag enabled works with no extra configuration. After you install and register a runner in the same location as a secondary site, it automatically talks to the closest site, and only proxies to the primary if the secondary is out of date.
## Use secondary runners with separate URLs
......@@ -35,7 +35,7 @@ When executing [a planned failover](../disaster_recovery/planned_failover.md), s
### With Location Aware public URL
When using the [Location Aware public URL](location_aware_external_url.md), all runners automatically connect to the closest Geo site.
When using [Location-Aware DNS](index.md#configure-location-aware-dns), all runners automatically connect to the closest Geo site.
When failing over to a new primary:
......
......@@ -63,13 +63,14 @@ you can decrease them.
You can set up a different URL for synchronization between the primary and secondary site.
The **primary** site's Internal URL is used by **secondary** sites to contact it
(to sync repositories, for example). The name Internal URL distinguishes it from
The **primary** site's Internal URL is used by **secondary** sites to contact
it. For example, to sync repositories. The name Internal URL distinguishes it from
[External URL](https://docs.gitlab.com/omnibus/settings/configuration.html#configuring-the-external-url-for-gitlab),
which is used by users. Internal URL does not need to be a private address.
When [Geo secondary proxying](../administration/geo/secondary_proxy/index.md) is enabled,
the primary uses the secondary's internal URL to contact it directly.
The Internal URL of a **secondary** site is used by the **primary** site to
contact it. For example, to retrieve sync or verification tracking metadata for
display in the Admin Area at **Geo > Sites > Project Repositories**.
The internal URL defaults to external URL. To change it:
......@@ -88,18 +89,6 @@ breaking communication between **primary** and **secondary** sites when using
HTTPS, customize your Internal URL to point to a load balancer with TLS
terminated at the load balancer.
## Multiple secondary sites behind a load balancer
**Secondary** sites can use identical external URLs if
a unique `name` is set for each Geo site. The `gitlab.rb` setting
`gitlab_rails['geo_node_name']` must:
- Be set for each GitLab instance that runs `puma`, `sidekiq`, or `geo_logcursor`.
- Match a Geo site name.
The load balancer must use sticky sessions to avoid authentication
failures and cross-site request errors.
<!-- ## Troubleshooting
Include any troubleshooting steps that you can foresee. If you know beforehand what issues
......
......@@ -6,9 +6,17 @@ info: Any user with at least the Maintainer role can merge updates to this conte
# Geo proxying
With Geo proxying, secondaries now proxy web requests through Workhorse to the primary, so users navigating to the
Secondaries proxy nearly all HTTP requests through Workhorse to the primary, so users navigating to the
secondary see a read-write UI, and are able to do all operations that they can do on the primary.
## High-level components
Proxying of GitLab UI and API HTTP requests is handled by the [`gitlab-workhorse`](../../development/architecture.md#gitlab-workhorse) component. Traffic usually sent to the Rails application on the Geo secondary site is proxied to the [internal URL](../../administration/geo/index.md#internal-url) of the primary Geo site instead.
Proxying of Git over HTTP requests is handled by the [`gitlab-workhorse`](../../development/architecture.md#gitlab-workhorse) component, but the decision to proxy or not is handled by the Rails application, taking into account whether the request is push or pull, and whether the desired Git data is up-to-date.
Proxying of Git over SSH traffic is handled by the [`gitlab-shell`](../../development/architecture.md#gitlab-shell) component, but the decision to proxy or not is handled by the Rails application, taking into account whether the request is push or pull, and whether the desired Git data is up-to-date.
## Request life cycle
### Top-level view
......
......@@ -889,7 +889,7 @@ DETAILS:
**Tier:** Premium, Ultimate
**Offering:** Self-managed
- [Geo proxying](../../administration/geo/secondary_proxy/index.md) was [enabled by default for different URLs](https://gitlab.com/gitlab-org/gitlab/-/issues/346112) in 15.1. This may be a breaking change. If needed, you may [disable Geo proxying](../../administration/geo/secondary_proxy/index.md#disable-geo-proxying). If you are using SAML with different URLs, you must modify your SAML configuration and your Identity Provider configuration. For more information, see the [Geo with Single Sign-On (SSO) documentation](../../administration/geo/replication/single_sign_on.md).
- [Geo proxying](../../administration/geo/secondary_proxy/index.md) was [enabled by default for different URLs](https://gitlab.com/gitlab-org/gitlab/-/issues/346112) in 15.1. This may be a breaking change. If needed, you may [disable Geo proxying](../../administration/geo/secondary_proxy/index.md#disable-secondary-site-http-proxying). If you are using SAML with different URLs, you must modify your SAML configuration and your Identity Provider configuration. For more information, see the [Geo with Single Sign-On (SSO) documentation](../../administration/geo/replication/single_sign_on.md).
- LFS transfers can redirect to the primary from secondary site mid-session. See
[the details and workaround](#lfs-transfers-redirect-to-primary-from-secondary-site-mid-session).
- Incorrect object storage LFS files deletion on Geo secondary sites. See
......
......@@ -1067,7 +1067,7 @@ You are not impacted:
|-------------------------|-------------------------|----------|
| 15.1 - 16.2 | All | 16.3 and later |
Workaround: A possible workaround is to [disable proxying](../../administration/geo/secondary_proxy/index.md#disable-geo-proxying). Note that the secondary site fails to serve LFS files that have not been replicated at the time of cloning.
Workaround: A possible workaround is to [disable proxying](../../administration/geo/secondary_proxy/index.md#disable-secondary-site-http-proxying). Note that the secondary site fails to serve LFS files that have not been replicated at the time of cloning.
## 16.1.0
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment