Virtual registries: cut ties with the Package Registry?
🔭 Context
When we started thinking about the Dependency proxy for packages, we investigated the first format: Maven.
At that time, we tried to understand all the requests that the dependency proxy would handle and we understood that only package files would go through the dependency proxy.
Then we got to the caching aspect of it to mimic the same feature on the dependency proxy for container images.
At this points, my reasoning went like this:
Well, we only have package files flying around and we need a place to store them. We already have such place: the package registry. Could we re-use the package registry?
At that time, I thought that integrating with the package registry would bring several benefits:
- UI already done.
- Cleanup policies already present.
- If a package is uploaded as usual to the package registry, it would be taken into account for the dependency proxy cache.
Now, I think I can safely say that this brought some challenges like:
- The permissions approach are not the same in the Package Registry (you need to be
developer+
to write) and in the Dependency Proxy (you just need to beguest
to access the dependency proxy).- This means that a
guest
pulling a package through the Dependency Proxy will never create a cache for that package. This fact almost defeats the actual purpose of the dependency proxy. - In other words, we had to take into account the Package Registry permissions.
- This means that a
- It is not clear on the package registry which packages comes from the Dependency Proxy and which ones comes from "usual" publications to the Package Registry.
A bigger problem was coming with package formats that have metadata endpoints (discussed here). These files from upstreams are not always tied to a package because they could work at different levels: the registry instance, the package (only name) or the package with a version.
On the other hand, the Package Registry only works with files that are linked to a given package (name + version).
You can imagine what happens: we're trying to fit a triangle into a square hole. Sure, we could update the Package Registry to handle those metadata files (that's actually what we want to do in Improve Package Registry metadata generation (&9835 - closed)).
Sadly, this is not so easy. As stated in #456983, the dependency proxy will need to do more than just storing those metadata files: it will need to modify them (update references to the dependency proxy urls).
Cutting the long story short, I made a design flaw in the Maven investigation and I now think that this is not the way to integrate with the Package Registry. A much proper way, would be to wait the Virtual Registry feature (upgrade of the Dependency Proxy where multiple upstreams are handled). In that feature, the Package Registry can become an (local) upstream of the Virtual Registry/Dependency Proxy. This way, we re-use the packages present in the Packages Registry but we never write to it from the Virtual Registry/Dependency Proxy.
Let's see some of the possible solutions.
1️⃣ Continue using the Package Registry as the cache location for the Dependency proxy for packages
Solution Continue the path that we start with the Maven dependency proxy.
Upsides:
- UI to browse the files already done.
- Cleanup policies available.
- Storage quota already taken into account.
- Upload endpoints implemented and can be re-used.
Downsides:
- Not all package formats registries support metadata files (only NPM at the time of this writing).
- Even if it is supported: I don't think it is correct to store metadata files with dependency proxy references in the Package Registry. Clients of the Package Registry will start using the Dependency Proxy even if that's not what users wanted.
- The technical requirements around metadata files are not exactly the same.
- Example: the Dependency Proxy needs to store the
ETag
of upstream files. TheETag
is usually the fingerprint of the files contents. As I said above, the Dependency Proxy will need to store modified metadata files, which means that theETag
fingerprint will be different. As such, the Dependency Proxy needs a way to store the upstreamETag
. This concept simply doesn't exist in the Package Registry (as there, we almost never modify files).
- Example: the Dependency Proxy needs to store the
- This is a strong tie with the Package Registry. If the Package Registry is disabled, the Dependency Proxy will not work.
- Related: the Package Registry works at the project level (when writing). This means that the Dependency Proxy feature will be "locked" at the project level.
2️⃣ Introduce a dedicated cache location for the Dependency proxy for packages
Solution Introduce a new cache location where we can store file in a more generic way (eg. not necessarily tied to a package)
Upsides:
- The dependency proxy cache entries can be for any file, be it package related or more general like registry related.
- We can store as many additional fields about the file as necessary (store the
Etag
from the upstream is required to handle some metadata files/response). - We can safely do whatever we want with the files, including modifying them for the dependency proxy (metadata files/response).
- We can store as many additional fields about the file as necessary (store the
- We can link the cache entry with the upstream source object (this is a required aspect for handling multiple upstreams).
- Standalone feature that has a single link to an existing object like a project. If there is a single link, it's easier to move the entire feature to a different level, like group.
- We can implement support for a package format that is not (yet) supported in the package registry.
- Opportunity: if we invest a bit more implementation time, we can prepare things for multi upstreams. That would be introducing the
Upstream
object/model. - Opportunities:
- Dedicated cleanup policies. We don't have a clear view here but what is better? Having a single set of cleanup policies that rule the package registry + dependency proxy or have two distinct sets of cleanup policies?
- Dedicated dependency firewall rules. This is hard to say for now but we know for sure that we can't have all the dependency proxy firewall features from the package registry in the dependency proxy.
Downsides:
- New implementation to put in place: database tables and a new uploader.
- New UI to put in place to display the cache or even browse it.
- We would have two approaches as we work on the dependency proxy but at some point the Maven dependency proxy will need to be migrated: that might be a data migration or a "start from scratch" process (as these are cache entries so not having them will not break user workflows. It will slow them though).
- We would need to implement the link with object storage quotas. Depending on the approach, this could be small.
📰 Expected result
- Discuss and choose the appropriate solution (
1️⃣ or2️⃣ or something else).- Take into account the impact on:
- the dependency firewall features for the dependency proxy (for packages).
- the dependency proxy evolution to handle multiple upstreams (virtual registries).
- Take into account the impact on:
- Provide an implementation/progression plan for the NPM dependency proxy.