Introduce a local cache entry model for maven virtual registries
🧦 Context
In the Maven virtual registry world, we pull files from an upstream through the GitLab virtual registry. While doing so, we also cache the requested file so that subsequent requests are served solely by GitLab and not by the upstream.
The modelization is as follows:
Registry <-n:1- RegistryUpstream -1:n-> Upstream <-n:1- CacheEntry
Basically, a Registry
can have multiple Upstreams
(through a join model RegistryUpstream
) and an Upstream
has many cache entries.
Up to now, an Upstream
was an url with optional credentials. That would define how to access a remote upstream.
With Maven virtual registries: local upstreams (#548558) • Bonnie Tsang • 18.6, we want to introduce the concept of a local upstream. Instead of looking for files in remote upstreams, we look at the GitLab Maven package registry. In short words, we point to a local project or local group and we inspect the (Maven) packages available at that project or group. To handle this, we need to update the existing logic to inspect local upstreams. Before that, we need to define what we need for a local upstream.
⚔️ Design choices
In !206725 (merged), the Upstream
model will support a local mode through specific url
(global ids pointing to projects or groups).
Now, on the cache entry level the problem we're going to have is that we need to record that a local Upstream
has a requested file and it is available to a given PackageFile
(model from the package registry). It is important to persist this information so that when we get another request for the same relative path, we can quickly find out that a local upstream can fulfill the request. If we don't have that information, we would "walk" the list of upstreams all the time and that can be a costly operation.
Thus, the challenge is how to store a PackageFile
id
in the existing cache entry table. Well, the short is: we can't. The existing table has many columns related to cache entry located on a remote upstreams. Thus, it is not a wise idea to try to store a package file id in the existing table.
Our solution here is: introduce a new dedicated table that will have the correct columns to store that a local upstream has a package file id that can fulfill a relative path.
Now, on the cache entry table subject, these tables are the core of the virtual registry cache system. In #473144 (comment 2199015293), we decided to use partitioning for the existing (remote) cache entries. The partition key is the relative_path
so that when we receive a request for a file (this is the most used request in virtual registries), we can leverage that to locate quickly the cache entry.
For local upstreams, users could be aggregating large amounts of package files under a single upstream. As such, we decided to use the exact same approach: partition the table. We will also use the same way: use the relative_path
as a partition key to speed up the access by relative_path
.
At some point, it should be ideal to rename virtual_registries_packages_maven_cache_entries
into virtual_registries_packages_maven_remote_cache_entries
to be extra clear. This will bring changes to the models and API. As such, I don't want to handle this here.
🗒️ Potential main queries
We don't have the follow up MRs that will actually use this new model but the expectation on the queries is as follows (selected from !174985 (merged))
- Insert a new record to the table. Pretty straightforward thing.
- Search a record given a
relative_path
. Similar to this one (without thestatus
column since we don't have it).
Destruction is mainly handled by cascading deletes and not directly by users. This means that a record in this table is only destroyed when the related (top level) group, the upstream or the package file is destroyed. For this part, since we don't have anything particular to do here (contrary to remote upstreams where we need to destroy a file on object storage), we're using the usual database cascading delete.
🗒️ Implementation plan
This change being quite deep in the existing logic, we're going to split it in multiple MRs:
-
Upstream
changes. !206725 (merged) -
Local::Cache::Entry
model and database changes.👈 This is this MR. - Update the services layer logic.
- Manage the local target destruction logic. eg. what happens when a project or group targeted by a local upstream is destroyed.
- Update the APIs logic. This is the client that manages upstreams (CRUD operations).
- Update the documentation.
🤔 What does this MR do and why?
- Introduce
::VirtualRegistries::Packages::Maven::Local::Cache::Entry
model and its table.- A few basic model validations are also introduced.
- Add the related specs.
In the follow up MRs, we will add the necessary scopes and additional util functions to the model.
📚 References
- Maven virtual registries: local upstreams backe... (#566217) • David Fernandez • 18.6 • On track.
- Maven virtual registries: local upstreams (#548558) • Bonnie Tsang • 18.6.
- Maven Virtual Registry - Road to General Availa... (&15089) • Tim Rizzi, Crystal Poole+ • 18.6 • On track.
🖥️ Screenshots or screen recordings
No UI changes
🧑🔬 How to set up and validate locally
There is not much you can do with the model alone since there is no business code that will interact with it (yet).
We can still play around in a rails console.
top_level_group = Group.top_level.sample
upstream = ::VirtualRegistries::Packages::Maven::Upstream.create!(group: top_level_group, url: "https://gitlab.com/maven1", name: "testing local cache entries")
e = ::VirtualRegistries::Packages::Maven::Local::Cache::Entry.new(group: top_level_group, upstream: upstream)
e.valid?
=> false
e.errors.to_a
=> ["Package file must exist", "Relative path can't be blank"]
e.relative_path = "foo/bar"
e.package_file = Packages::PackageFile.last
e.valid?
=> true
🏁 MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.