Improve permissions evaluation by caching it
🥗 Context
In &14137 (closed), we're working towards the first step of Virtual Registries. In short words, Virtual Registries, the GitLab instance is used as a pull through proxy between a package manager client and an "upstream" registry.
Our first iteration covers the Maven package format.
Maven client ($ mvn for example) <-> GitLab instance <-> Upstream Registry (like Maven Central)
The main goal is that at some point, users can handle multiple upstreams behind a single virtual registry. In addition, files pulled through the GitLab instance are cached to object storage. This way, when pulling the same file multiple times will not require pulling from upstream.
We have been look at the performance of requests when we have a cache hit. Basically, we authenticate the user, locate the virtual registry and the correct cache entry and serve the related file from object storage.
As described in issue #504278 (closed), we observed a pretty high cpu_s timing: #504278 (closed).
Upon digging further, it appeared that the permission evaluation can take a non trivial amount of time. At this point, we're assuming that since the (virtual) registry policy delegates some of its logic to the group policy, that policy comes with several parts and conditions to evaluate.
⚔️ Designing a solution
We could have looked at the (group) policy and see they were some improvements but that policy is quite central and any change will be quite delicate to manipulate/deploy.
Instead, we went for a different route.
When Maven clients will pull dependencies through virtual registries, they will not request 1 file. In Maven's world, 1 dependency can lead up to 12 requests. In other words, a single maven command (like $ mvn compile) can end up triggering hundreds if not thousands of requests to the virtual registry API.
One thing to note here is that the maven client will use the exact same credentials for all these requests. In other words, we will have the same current_user and the same (virtual) registry object (and also the same permission checked).
From the above: do we really need to evaluate authorization for all these requests. It will have the exact same inputs and the exact same output. This screams for one thing: caching. That's right, the idea here is to cache the permissions evaluation across requests.
Please note that by default, the declarative policy will cache permissions evaluation within the same request. Thus, executing can?(current_user, :read_virtual_registry, registry) multiple times will not incur a performance hit because on the first call will be evaluated. Subsequent calls will use a cached result. However, here we're working on multiple different requests, so that cache will not be useful for our case. Thus, we need to add another cache that will live through several requests.
In technical words, we will use the usual Rails.cache to cache the results for 5.minutes. We might need to tweak that value but we think that it's a good starting point. We don't want to cache the permissions check for too long because that's basically extra time that users will still have the :read_virtual_registry permission.
🤔 What does this MR do and why?
- Cache the
can?call in the handle file request service. - Update the related spec.
The entire Maven virtual registry is behind a WIP feature flag. This feature is not released yet. Thus, we don't need a changelog trailer here.
📖 References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
🏎️ MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
🌈 Screenshots or screen recordings
No UI changes.
⚙️ How to set up and validate locally
🔧 Setup
-
Enable the feature flag :
Feature.enable(:virtual_registry_maven). -
Have a PAT and a root group (any visiblity) ready.
-
Have a runner ready.
-
For the virtual registry settings, we don't have an UI or API (yet), we thus need to create them in a rails console:
r = ::VirtualRegistries::Packages::Maven::Registry.create!(group: <root_group>) u = ::VirtualRegistries::Packages::Maven::Upstream.create!(group: <root_group>, url: 'https://repo1.maven.org/maven2') VirtualRegistries::Packages::Maven::RegistryUpstream.create!(group: <root_group>, registry: r, upstream: u) -
Fork this project.
-
In
pom.xml, remove the<repositories></repositories>tag and its contents. -
Add a
settings.xmlfile with (replace<r.id>with the registry id):<settings> <mirrors> <mirror> <id>gitlab-maven</id> <name>GitLab proxy of central repo</name> <url>http://gdk.test:8000/api/v4/virtual_registries/packages/maven/<r.id></url> <mirrorOf>central</mirrorOf> </mirror> </mirrors> <servers> <server> <id>gitlab-maven</id> <configuration> <httpHeaders> <property> <name>Job-Token</name> <value>${CI_JOB_TOKEN}</value> </property> </httpHeaders> </configuration> </server> </servers> </settings> -
add a
.gitlab-ci.ymlfile with:compile: image: maven:latest script: - 'mvn compile -s settings.xml'
The idea here is to have a pipeline that will pull files through the Maven virtual registry. We will use the execution time reported by the mvn command as a way to measure the improvement.
🌞 Warming the cache
Run the pipeline once. This will pull the files (close to 1000) through the virtual registry and filling the cache while doing so.
You can verify the cache with (in a Rails console):
u.cached_responses.count # => 943
1️⃣ With master branch
Running the pipeline using master, the mvn command reports this execution time:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:08 min
[INFO] Finished at: 2024-11-19T08:09:51Z
[INFO] ------------------------------------------------------------------------
2️⃣ With this MR
Running a pipeline using this MR, the mvn command reports this execution time:
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.230 s
[INFO] Finished at: 2024-11-19T08:31:57Z
[INFO] ------------------------------------------------------------------------
That's a ~30% improvement.
Obviously, things will be different on gitlab.com but we should still see an improvement.