Investigate dependency proxy for packages performance
🔥 Problem
Compare the performance of the following scenarios:
- CI job pulling all packages from an official registry directly vs through a dependency proxy (with a warm cache).
- local machine (located far from the gitlab.com object storage) pulling all packages from an official registry directly vs through a dependency proxy (with a warm cache).
If usage (2.) (non CI) is larger that usage (1.) (CI), then a CDN approach (as in https://about.gitlab.com/blog/2022/10/25/gitlab-com-artifacts-cdn-change/) is valuable as it will help users to retrieve packages from closer locations. However, from my understanding, this brings no benefit in usage (1.).
🚒 Solution
- Investigate the dependency proxy usage. In particular, analyze the CI usage vs the non CI usage.
- We can use GCP IP ranges here to detect request coming from GCP. See https://www.gstatic.com/ipranges/goog.json.
- Depending on the results of (1.), investigate if using a CDN is worth here.
- If that's the case, investigate the amount of work to do.
🔮 Other considerations
The same analysis path/logic could be applied to the dependency proxy for container images.
Edited by David Fernandez