Add Geo check for tracking DB on primary
Problem
After promotion, if you want to set up Geo again (using the promoted site as the primary Geo site) then you must ensure that the promoted site does not have a Geo tracking database configured anymore.
If the primary Geo site does have a Geo tracking database configured, then you cannot log in to a secondary Geo site. When you try to, you get a redirect loop.
Proposal
Add a check in rake gitlab:geo:check which fails if the current site is a primary and it has a Geo tracking database configured.
Implementation Guide
Use a recently added Geo check as an example !111923 (merged).
A notable difference is that you must make the check run only on primary Geo sites.
Example use-case
Let's say that during the course of a Geo site promotion, you run into some issue. During troubleshooting, you delete the gitlab-cluster.json file and configure gitlab.rb according to setting up a primary Geo site. But you neglect to delete the geo_secondary[...] lines, which configure a Geo tracking database.
More details
(Copied from a discussion in a confidential customer issue)
The request flow in the logs starts with:
- Visit the secondary at root path.
- As expected, the secondary proxies the request for root path to the primary.
- Already here we diverge: The primary redirects to an absolute path for the secondary's /users/auth/geo/sign_in
If the user is signed out, then we expect the primary to redirect to /users/sign_in path.
/users/auth/geo/sign_in is only supposed to be used for requests requiring a user session on the secondary. The only routes that require that are /admin/geo/replication/projects and /admin/geo/replication/designs.
In step 3 above, the primary is behaving like a secondary does here.
I think this might be it:
A secondary site does the redirect here.
This check
::Gitlab::Geo.secondary?(infer_without_database: true)was modified so that Rails does not have to be loaded before routes can be loaded (it was also incompatible with Rails 7).But technically the check now uses a heuristic rather than the single source of truth.
The result is: If you have a Geo tracking database connection set up in a primary Geo site, then the primary Geo site's sign in/sign out routes will behave like a secondary Geo site.
I think maybe if we disable the Geo tracking database on the primary, then Omnibus won't configure the
geo:connection inconfig/database.yml. So try setting in the primary's/etc/gitlab/gitlab.rb:geo_secondary['enable'] = falseAnd
gitlab-ctl reconfigure.
Ideally we wouldn't need the heuristic, but it is there to solve some other real problems. Maybe we could solve it in a different way that doesn't have this risk.
🤔 It looks like promoting the Geo site should have disabled the Geo tracking database, but it would have done so in the
gitlab-cluster.jsonfile, which overrides/etc/gitlab/gitlab.rb.Was
gitlab-ctl geo promoteused?I recall in Slack that this particular promotion was not at all smooth. Is it the case that the
gitlab-cluster.jsonfile was removed at some point and the necessary changes were done manually in/etc/gitlab/gitlab.rb? This is a risk with thegitlab-cluster.jsondesign./etc/gitlab/gitlab.rbcontradicts reality after a Geo site promotion.
it worked when they did
geo_secondary['enable'] = false