Investigate how to expand the GitLab Package Registry to support remote and virtual repositories
Problem to solve
A typical software project relies on a variety of dependencies, which we call Packages. Packages can be internally built and maintained, or sourced from a public repository. Based on our user research, we’ve learned that most projects use a 50/50 mix of public vs. private packages. When installing packages, the order in which they are found and downloaded is very important, as downloading/using an incorrect package or version of a package can introduce breaking changes and security vulnerabilities into their pipelines.
Sidney wants to rely solely on GitLab as a universal package manager so that they can reduce costs and drive operational efficiencies. However, GitLab only supports privately hosted package repositories, which only accounts for half of their team's use cases. In addition, the naming conventions enforced by the GitLab Package Registry, make it impossible for organizations with many teams and many developers to use GitLab’s offering.
Target audience
Proposal
Investigate how we can expand the Package Registry to support the creation, usage and management of remote and virtual repositories. The output of this issue should be a planned implementation strategy and set of issues added to the epic Expand the GitLab Package Registry by adding the ability to connect and setup remote and virtual repositories
User flow
Sydney would like to configure GitLab to function as a universal package manager for their team. They read that GitLab will allow them to add remote and virtual package repositories, which will help them to manage all of their group's dependencies (internal and external) in one central place.
- Sydney opens the GitLab app and navigates to their group.
- They navigate to
Packages & Registries
-->Dependency Proxy
- They have not used this feature before and they see an empty state page with another great GitLab graphic and a link to the documentation.
- They navigate to https://docs.gitlab.com/ee/user/packages/dependency_proxy/
- They read about how to add and configure remote and virtual repositories using either the API or the UI.
- They switch back to the GitLab and decide to play around in the UI.
- They click a button to add a new repository.
- They are prompted to select the format and type.
- They choose to create an npm remote repository.
- They enter a name for the repository.
- They see that they need to enter a URL where the repository is hosted and they copy and paste a link to www.npmjs.com.
- They define a maximum package age which will define how long to cache artifacts before rechecking the remote repository.
- They see the option to define storage and that it's set to
default
which will use what was defined when installing GitLab. - They see that a box is checked by default to enforce content validation or to restrict usage to files appropriate for npm.
- They see that a box is checked enabling a feature to cache packages for future use.
- They see an option for a cache expiration policy and that it has been set to
30 days
. - They decide to leave it set at 30 days.
- They see an option to enable authentication and enter a user name and password.
- They see HTTP Request settings and check the defaults to ensure they make sense.
- They see that GitLab has defined defaults for the number of connection retries and for connection timeouts.
- They leave the defaults set.
- They hit create.
- They view and copy a code snippet that they can run to quickly configure npm to point to that repository, so they can test that it works.
- They run
npm install
and it works. - They create another hosted repository, repeating the above steps.
- Sydney decides to create a Virtual repository to try grouping the previously created remote repositories.
- Sydney clicks a button to create a new repository and when prompted, selects an npm + virtual repository.
- They enter a name for the repository.
- They leave the storage and content validation set to default.
- They see an option to add any existing npm repositories to and that it allows them to order them.
- They are happy with the ordering and they create the repository.
- They view and copy a code snippet that they can run to quickly configure npm to point to that repository, so they can test that it works.
- They choose to install a package that is in the second repository and not the first and attempt to install a package, using the new virtual endpoint. It works!
- Sydney would like to add some of their already create, private npm repositories to the group. They do so and define an ordered list of hosted and remote repositories using one virtual endpoint.
- Sydney shares that endpoint with their development team and says "When you are installing packages using Gitlab CI, you can now simply use
http://gitlab.example.com/repository/npm-all/
and it will automatically resolve the correct package, in the following order:...." - Sydney navigates back to the Dependency Proxy and views all of their repositories in one place.
Further details
Hosted vs. Remote vs. Virtual repositories
- A
hosted
package repository is one that is hosted within your GitLab instance. Currently, we only supported hosted repositories. It is likely that a group will have several hosted repositories. For example, project-level, feature-level or for Maven it's common to create adev
,snapshots
andreleases
repositories. - A
remote
package repository is any repository that is hosted outside of your GitLab instance. The most common examples of these are https://www.npmjs.com/ and https://mvnrepository.com/repos/central, but they also may be hosted on an S3 instance or within another product. - A
virtual
package repository is one that groups an ordered list ofremote
andhosted
repositories and exposes them using a single endpoint, such ashttp://gitlab.example.com/repository/npm-all/
.
Considerations
Instance vs. Group-level vs. Project-level
We recently adding request forwarding for npm packages at the instance-level. So, if you try to install an npm package from your GitLab hosted repository, and it's not there, we will search npmjs.org.
However, I'm not sure this feature makes sense at the instance level.
- The Dependency Proxy is at the group level, and I expect this feature will be built with it.
- We don't want to hide this functionality in the instance level admin screens.
How does this impact publishing
Although this feature does impact publishing packages, in that you are creating repositories to publish to, it won't change much. The big change is that you will be able to install packages using the URL for your virtual
repository.
Which formats to support
We are basing our prioritization decision on the popularity of each format. The current priority order is:
- npm
- Maven
- NuGet
- PyPI
- Conan
Design questions
- Where should a user go to add, update and delete package repositories
- What information do they need to include when adding a new repository? How does this differ from format to format?
- How does the group repositories in the Virtual repository?
- Following the GitLab principle of convention over configuration how can we ensure users have a good default experience and that we help them to limit configuration?
Technical considerations
Network requests + time outs
A virtual package repository can include a hosted, remote or an other virtual repository. We need to have a limit here for ~performance reasons. The number of remote repositories we request must be limited: these will be pinged in sequence when npm
or yarn
is executing install
. In other words, we're making network requests within a network request -> there is a hard timeout of 60s on production but we should aim for a lower time.
On a positive note, the cache feature will definitely help here (hitting the cache instead of hitting remote repository is a gain) but we still need to support the worst scenario where the cache is empty and not used at all.
GraphQL
We will consider implementing this in GraphQL to allow greater flexibility and performance in the UI.
API
The API will allow you to:
- Create and configure a package repository.
- Update the configurations of an existing repository.
- List your group's repositories.
- Delete a repository.
Attributes
Hosted repository attributes (sample)
There are likely more attributes that we should consider, but starting with:
- "key": "hosted-repo1",
- "type" : "hosted",
- "packageType": "maven" | "nuget" | "npm" | "composer" | "pypi" | "go" | "conan" |
- "description": "The local repository public description",
- "notes": "Notes"
Other attributes we should consider
- Allow duplicate uploads
- Enforce semantic versioning
- Any additional format-specific settings, such as how we version Maven snapshots.
Remote repository attributes (sample)
- "key": "remote-repo1",
- "type" : "remote",
- "packageType": "maven" | "nuget" | "npm" | "composer" | "pypi" | "go" | "conan"
- "url" : "https://www.npmjs.com/",
- "description": "Description",
- "notes": "Notes"
- "username": "user",
- "password": "pass",
Other attributes we should consider
- Socket timeouts in MS
- Set public registries as default remote repositories
- Store packages locally (caching)
Virtual repository attributes (sample)
- "key": "virtual-repo1",
- "rclass" : "virtual",
- "type" : "remote",
- "packageType": "maven" | "nuget" | "npm" | "composer" | "pypi" | "go" | "conan" |
- "repositories": ["hosted-repo1", "remote-repo1", "remote-repo2", "virtual-repo2"]
- "description": "Description",
- "notes": "Some internal notes",
Other attributes we should consider
- Exclude packages matching a given criteria (Dependency Firewall)
Future iterations will include
- Use policies to enforce compliance and security regulations
- View usage data
- Cache frequently used packages
Permissions
- The ability to CRUD package repositories will be limited to Maintainers and Owners
Documentation
What does success look like, and how can we measure that?
Success looks like we have give users a way of creating package repositories that meet all of their use cases.
Measure
- Count # of repositories created, updated, listed and deleted (segment by repository type and format)