NPM dependency proxy tgz download file endpoint
☂ Context
This is the very first step of npm virtual registry (&3608). With that epic, we aim to bring the dependency proxy for NPM packages.
This feature is very similar to the existing Maven dependency proxy.
In short words, the GitLab instance is between the package manager client and a package registry:
Package Manager clients (npm, yarn, ...) <-> GitLab <-> External Package Registry (such as npmjs.org)
The GitLab instance will see packages flying through and we will take this opportunity to cache them (in the package registry). This way, next time the same file is requested, we don't need to pull it from the external package registry (which could even be down).
This is the first step of the NPM dependency proxy. We are going to start with the download endpoint, that's the url that is accessed to pull the actual files. See NPM dependency proxy: implement the download tg... (#435644 - closed).
One note here, we can't implement caching (yet) for this interaction as we need a custom upload endpoint and this is NPM dependency proxy: implement the upload endp... (#441267).
So in short, this MR implements the following logic:
- Does the requested file exists in the package registry?
- If
yes
, return it. - If
no
, does the user can write to the package registry?- If
no
, stream the file from the upstream registry. - (the if
yes
branch here requires the custom upload endpoint, hence is not part of this MR).
- If
- If
🤔 What does this MR do and why?
- Add the NPM dependency proxy download tgz file endpoint.
- Re-use as much as possible the function helpers from the Maven dependency proxy.
- Add the related requests.
- Feature specs were left for a follow up as this MR is already quite large.
- This being the very first step, introduce the
wip
feature flag that gates the entire NPM dependency proxy.
🏁 MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
🦄 Screenshots or screen recordings
Hum, no changes on the GitLab UI
⚙ How to set up and validate locally
We basically have a few use cases to go through. So let's set up what we will need.
- Enable the related feature flag
packages_dependency_proxy_npm
. - Create a project (any visibility) and get:
- a
maintainer
user (root
will do) and a related PAT (withapi
scope). - a
reporter
user and a related PAT (withapi
scope).
- a
We're going to target https://gitlab.com/issue-reproduce/packages/npm/npm-package/-/packages for the upstream registry. You can target a private project on gitlab.com
but in that case, you will need a PAT to access that.
In a rails console:
DependencyProxy::Packages::Setting.create!(project_id: <project_id>,enabled: true, npm_external_registry_url: "https://gitlab.com/api/v4/projects/15833924/packages/npm")
Alright, now, we are going to use $ curl
to simulate the tgz download requests that NPM clients will do.
1️⃣ File not cached
With a reporter:
$ curl --header "Authorization: Bearer <reporter pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
$ curl
received the .tgz
file and is unable to properly display it (by default). So this is working as expected
With a maintainer:
$ curl --header "Authorization: Bearer <maintainer pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
curl: (18) transfer closed with 292 bytes remaining to read
What happens here is that the dependency proxy tries to upload the tgz
file from the upstream registry to the GitLab instance. However, this requires a custom upload endoint that is not implemented yet which will end up in $ curl
2️⃣ File cached
Let's cache a similarly named file. In a rails console:
# stub file upload
def fixture_file_upload(*args, **kwargs)
Rack::Test::UploadedFile.new(*args, **kwargs)
end
FactoryBot.create(:npm_package, project: Project.find(<project_id>), name: '@10io/bananas', version: '1.3.7')
Packages::PackageFile.last.update!(file_name: '@10io/bananas-1.3.7.tgz', file_md5: '321da73828aea9fca18d4b9525408459', file: CarrierWaveStringFile.new_file(file_content: 'test', filename: '@10io/bananas-1.3.7.tgz', content_type: 'application/gzip'))
(note that we set the content of the file to test
)
Let's try with a reporter:
$ curl --header "Authorization: Bearer <reporter pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
test
The file (well, the cache is returned)
Let's try with a maintainer:
$ curl --header "Authorization: Bearer <maintainer pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
test
The file content is returned (test
). Notice that the upload endpoint is not necessary because the cache is already filled so it's returned directly.
The code changes are behaving the way we want