Maven virtual registries: local upstreams
🔥 Problem statement
Up to now, in the maven virtual registry, users could only build a list of remote upstreams. By remote, we mean a target url
(with optional credentials). That url
would point to a system that is outside the GitLab instance.
There can be a need to support local upstreams. By local, we mean targeting a project or group that lives in the same GitLab instance than the maven virtual registry.
Currently, this is not blocked as users can point to the project or group level endpoint of the GitLab Maven package registry as a remote upstream in a virtual registry and it will work. However, this is suboptimal as the virtual registry will cache the requested files. Thus, for files coming from GitLab projects (of the same instance), they are stored twice: once for the package registry and once for the virtual registry. This is not ideal since object storage usage is certainly not free.
🚒 Solution
The solution here is to categorize the upstreams so that the backend knows if it is dealing with a remote or local upstream.
Since polymorphic associations are not recommended, the simplest solution here would be to work at the upstream
level and add an optional foreign key to a project or group.
Thus we would have (overview):
- Update
VirtualRegistries::Packages::Maven::Upstream
to have aproject_id
orgroup_id
column (fk to projects, optional).- We need to have a validation so that this
project_id
orgroup_id
is one of the projects or groups contained in the (top level) parent group. - This
project_id
column should be set whenurl
is not set and vice versa (mutually exclusive). -
username
andpassword
can be set only whenurl
is set. - It is not clear at this point if we need to have a
kind
column to quickly select local and remote upstreams as we can selecturl IS NULL
orproject_id IS NULL
. Specific indexes might be required.
- We need to have a validation so that this
- Update the handle file request service to support the case where the file comes from a local upstream
- This file doesn't need to be cached at all and should be returned directly from the related package file.
- Update the check upstream service to support local upstreams. One valuable approach could be:
- Check all local upstreams.
- If the package is found on a local upstream: do we need to check a remote upstream? Yes, do it. No, return that local upstream.
- If the package is not found on any local upstreams: ignore all local upstreams in the upstream lists and walk the remote upstreams (similar to what we do today).
- The idea is that we can check file existence on multiple local upstreams in a single database query, thus we should leverage that to optimize the amount of network calls we do to remote upstreams.
- Accessing local upstreams will always be faster than remote upstreams.
- Check all local upstreams.
- Update the upstream APIs to allow:
- creating an upstream with the
project_id
orgroup_id
. - expose the
project_id
orgroup_id
field when returning an upstream.
- creating an upstream with the
Remaining points to define:
- implementation plan: MRs, aspects.
- permissions.
Design Requirements
UI/UX Design Needs
- Design an intuitive interface for selecting GitLab projects as local upstreams
- Create clear visual differentiation between local and remote upstreams in the list view
- Design status indicators showing the health/availability of local upstream projects
- Develop a UI flow for testing connections to local upstreams
Mockups Needed
- Project/group selector interface with search/filter functionality
- Updated upstream list view showing both remote and local upstreams
- Detail view for local upstream configuration
- Connection test interface with appropriate success/failure states
User Experience Considerations
- Platform engineers should be able to easily switch between configuring local and remote upstreams
- Users should understand the storage benefits of local vs. remote upstreams
- The relationship between the virtual registry and local project/group should be clearly visualized
- Navigation between the virtual registry and referenced local projects/groups should be seamless
Analytics/Monitoring View
- Create a visualization for local upstream usage patterns