Draft: Geo: Accelerate more data types by serving from Geo secondary site

Problem to solve

Geo replication is used by our customers to accelerate access to GitLab data by locating the accessed data closer to their users by setting up replication sites geographically closer to them.

Currently we access Git read requests by serving the request from the closest secondary site. Geo replicates almost all of the different data types generated by GitLab. Therefore, customers can benefit from Geo being able to serve more data type from the local site instead of proxying the request to the primary site each time.

Intended users

Proposal

To deliver the most value for our customers we will focus on large self-contained data that is most frequently accessed. This allows for the request to be independently accelerated without dependency on objects that may a request to be proxied to the primary.

We will transparently accelerate read requests for the data types below by serving them from the closest secondary site.

  • Container registry
  • CI pipeline artifacts
  • CI job artifacts

Documentation

We need to update https://docs.gitlab.com/ee/administration/geo/secondary_proxy/

Testing

We will need to test the performance, consistency and reliability of the data served by the secondary site to ensure it meets the needs of the jobs that customers are attempting to perform.


What does success look like, and how can we measure that?

  • We want to see data access requests being accelerated by the secondaries to reduce load on the primary site and to provide a faster experience for users based at remote sites.
  • We want to build in knowledge of which specific data types are being proxied from the secondary site(s) to the primary. The goal is to identify data types that can be accelerated that provide the most value to our customers.

What is the type of buyer?

  • Premium
  • Ultimate

Links / references

Edited by Sampath Ranasinghe