Discussion: Object Storage based Geo
Background
As evident in this doc https://docs.gitlab.com/ee/administration/geo/replication/object_storage.html, Object Storage is already very compatible with Geo. You can use it for Artifacts, Uploads, and LFS Objects, and "it just works". And Geo is basically not involved. Can we use this fact to our advantage?
Object Storage works especially well for very large enterprises. GitLab.com itself has migrated as much data as possible to Object Storage because it's:
- Scalable
- Highly available
- Durable
- Secure
- Flexible
Other than Artifacts, Uploads, and LFS Objects, there are other classes of data that can be stored either locally or in Object Storage. And Geo doesn't handle them at the moment: Container Registry, External Diffs, Maven Packages.
Pages can only be stored locally at the moment, though it is likely to end up subsumed into Artifacts one way or another so we can probably ignore it here.
The last major class of data outside of the relational database is Git Repositories. We already have an issue for putting these in Object Storage: &479. Regardless, it doesn't block this proposal for all other classes of data.
Proposal
-
Determine whether Docker Registry gracefully handles the case where a meta-file exists but the blob doesn't (because object storage synced the meta-file first, and the blob is taking a long time) -
Here's the first unsolved part (thanks to @brodock for pointing it out): S3 and GCS should work, but this proposal hinges on the existence of an on-premise, open source, distributed, replicated (with low latency), object storage solution. -
Product discovery Survey the landscape of open source object storage solutions especially to find one that supports replication across regions/zones (not just HA) -
Product discovery If no existing solution supports replication, then investigate if it's worth building replication of object storage ourselves (rather than replication of each type of GitLab resource)
-
-
Package this object storage solution with GitLab for customers who require a private cloud and don't already have one, or for minimal Geo installations (though it would be awesome if this wasn't a Geo-specific change). -
Rearrange setup to use object storage. The packaged one by default, otherwise you can use whatever cloud you want. -
Delete tons of Geo code
Why?
- Syncing is hard. It takes a lot of work and maintenance, and there are many bugs yet to be found.
- Customers get more value. Geo's custom file syncing can't compete in many ways with e.g. GCS or S3.
- Geo can focus on improving the experience at a higher level, rather than solving and maintaining sync logic.
- Geo already must be compatible with Object Storage. We're just not supporting it as well as we should be at the moment.
- Geo doesn't need to add syncing of Container Registry, External Diffs, and Maven Packages.
- Potentially easier to implement writes on secondaries?
Downsides
- Another dependency.
- More configuration to handle.
- Syncing is less visible/granular.
- Though there's no reason we can't have secondaries check for whether something has appeared yet. We already have a tracking database.
- And your cloud storage solution may provide more metrics/analytics than we do currently.
@geo-team This may have been discussed in the past but I didn't see it. I am sure there is more to consider here. WDYT?