Add Geo check for tracking DB on primary

Problem

After promotion, if you want to set up Geo again (using the promoted site as the primary Geo site) then you must ensure that the promoted site does not have a Geo tracking database configured anymore.

If the primary Geo site does have a Geo tracking database configured, then you cannot log in to a secondary Geo site. When you try to, you get a redirect loop.

Proposal

Add a check in rake gitlab:geo:check which fails if the current site is a primary and it has a Geo tracking database configured.

Implementation Guide

Use a recently added Geo check as an example !111923 (merged).

A notable difference is that you must make the check run only on primary Geo sites.

Example use-case

Let's say that during the course of a Geo site promotion, you run into some issue. During troubleshooting, you delete the gitlab-cluster.json file and configure gitlab.rb according to setting up a primary Geo site. But you neglect to delete the geo_secondary[...] lines, which configure a Geo tracking database.

More details

(Copied from a discussion in a confidential customer issue)

The request flow in the logs starts with:

  1. Visit the secondary at root path.
  2. As expected, the secondary proxies the request for root path to the primary.
  3. Already here we diverge: The primary redirects to an absolute path for the secondary's /users/auth/geo/sign_in

If the user is signed out, then we expect the primary to redirect to /users/sign_in path.

/users/auth/geo/sign_in is only supposed to be used for requests requiring a user session on the secondary. The only routes that require that are /admin/geo/replication/projects and /admin/geo/replication/designs.

In step 3 above, the primary is behaving like a secondary does here.

I think this might be it:

A secondary site does the redirect here.

This check ::Gitlab::Geo.secondary?(infer_without_database: true) was modified so that Rails does not have to be loaded before routes can be loaded (it was also incompatible with Rails 7).

But technically the check now uses a heuristic rather than the single source of truth.

The result is: If you have a Geo tracking database connection set up in a primary Geo site, then the primary Geo site's sign in/sign out routes will behave like a secondary Geo site.

I think maybe if we disable the Geo tracking database on the primary, then Omnibus won't configure the geo: connection in config/database.yml. So try setting in the primary's /etc/gitlab/gitlab.rb:

geo_secondary['enable'] = false

And gitlab-ctl reconfigure.

Ideally we wouldn't need the heuristic, but it is there to solve some other real problems. Maybe we could solve it in a different way that doesn't have this risk. 🤔

It looks like promoting the Geo site should have disabled the Geo tracking database, but it would have done so in the gitlab-cluster.json file, which overrides /etc/gitlab/gitlab.rb.

Was gitlab-ctl geo promote used?

I recall in Slack that this particular promotion was not at all smooth. Is it the case that the gitlab-cluster.json file was removed at some point and the necessary changes were done manually in /etc/gitlab/gitlab.rb? This is a risk with the gitlab-cluster.json design. /etc/gitlab/gitlab.rb contradicts reality after a Geo site promotion.

it worked when they did geo_secondary['enable'] = false

Edited by Michael Kozono