Geo secondaries can use a different object storage endpoint to the primary
Description
Geo (+ Geo DR) provides resilience for GitLab against the destruction of any one geographic region.
Currently, if uploaded files, LFS objects, etc, are stored on disc on the primary, then they are copied to the secondary, so providing this resilience.
If the primary is storing these files into an external object storage service (a minio server or S3, perhaps for scalability reasons), then at present, Geo does nothing with these files. They continue to exist in a single geographic region, and the secondary is expected to be configured to contact the same object storage service as the primary to retrieve files.
Having Geo secondaries notice when the primary uploads a file to an object storage service, and automatically replicate it to an object storage service in the secondary's geographic region, is an outstanding feature proposal encapsulated in these issues:
- https://gitlab.com/gitlab-org/gitlab-ee/issues/2388
- https://gitlab.com/gitlab-org/gitlab-ee/issues/415
(Do we need one for attachments as well?)
However, there is another use case - where the object store service performs the replication without any need for Geo to do anything. The primary uploads the file to the object store in its geographic region and, as if by magic from Geo's point of view, the file is available in the secondary's geographic region.
Proposal
We need to support and document this "external synchronization of object stores" model.
The current behaviour of Geo is correct for these circumstances, so the amount of new code we need is minimal.
The secondaries need to be configured to access a different object storage endpoint - since this is done in gitlab.yml
, this is already possible, but it is undocumented.
We should document how to set up external replication for the most common services (S3, ceph, perhaps minio? Any others?)
Links / references
This has some parallels with the elasticsearch + geo issue: https://gitlab.com/gitlab-org/gitlab-ee/issues/1186
Documentation blurb
Overview
What is it? Why should someone use this feature? What is the underlying (business) problem? How do you use this feature?
Use cases
Who is this for? Provide one or more use cases.
Feature checklist
Make sure these are completed before closing the issue, with a link to the relevant commit.
- Feature assurance
- Documentation
- Added to features.yml