Skip to content

NPM dependency proxy tgz download file endpoint

David Fernandez requested to merge 435644-tgz-download-endpoint into master

Context

This is the very first step of npm virtual registry (&3608). With that epic, we aim to bring the dependency proxy for NPM packages.

This feature is very similar to the existing Maven dependency proxy.

In short words, the GitLab instance is between the package manager client and a package registry:

Package Manager clients (npm, yarn, ...) <-> GitLab <-> External Package Registry (such as npmjs.org)

The GitLab instance will see packages flying through and we will take this opportunity to cache them (in the package registry). This way, next time the same file is requested, we don't need to pull it from the external package registry (which could even be down).

This is the first step of the NPM dependency proxy. We are going to start with the download endpoint, that's the url that is accessed to pull the actual files. See NPM dependency proxy: implement the download tg... (#435644 - closed).

One note here, we can't implement caching (yet) for this interaction as we need a custom upload endpoint and this is NPM dependency proxy: implement the upload endp... (#441267).

So in short, this MR implements the following logic:

  • Does the requested file exists in the package registry?
    • If yes, return it.
    • If no, does the user can write to the package registry?
      • If no, stream the file from the upstream registry.
      • (the if yes branch here requires the custom upload endpoint, hence is not part of this MR).

🤔 What does this MR do and why?

  • Add the NPM dependency proxy download tgz file endpoint.
    • Re-use as much as possible the function helpers from the Maven dependency proxy.
  • Add the related requests.
    • Feature specs were left for a follow up as this MR is already quite large.
  • This being the very first step, introduce the wip feature flag that gates the entire NPM dependency proxy.

🏁 MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

🦄 Screenshots or screen recordings

Hum, no changes on the GitLab UI

How to set up and validate locally

We basically have a few use cases to go through. So let's set up what we will need.

  1. Enable the related feature flag packages_dependency_proxy_npm.
  2. Create a project (any visibility) and get:
    • a maintainer user (root will do) and a related PAT (with api scope).
    • a reporter user and a related PAT (with api scope).

We're going to target https://gitlab.com/issue-reproduce/packages/npm/npm-package/-/packages for the upstream registry. You can target a private project on gitlab.com but in that case, you will need a PAT to access that.

In a rails console:

DependencyProxy::Packages::Setting.create!(project_id: <project_id>,enabled: true, npm_external_registry_url: "https://gitlab.com/api/v4/projects/15833924/packages/npm")

Alright, now, we are going to use $ curl to simulate the tgz download requests that NPM clients will do.

1️⃣ File not cached

With a reporter:

$ curl --header "Authorization: Bearer <reporter pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.

$ curl received the .tgz file and is unable to properly display it (by default). So this is working as expected 🎉

With a maintainer:

$ curl --header "Authorization: Bearer <maintainer pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz" 
curl: (18) transfer closed with 292 bytes remaining to read

What happens here is that the dependency proxy tries to upload the tgz file from the upstream registry to the GitLab instance. However, this requires a custom upload endoint that is not implemented yet which will end up in 💥 and closing early the connection with $ curl

2️⃣ File cached

Let's cache a similarly named file. In a rails console:

# stub file upload
def fixture_file_upload(*args, **kwargs)
  Rack::Test::UploadedFile.new(*args, **kwargs)
end

FactoryBot.create(:npm_package, project: Project.find(<project_id>), name: '@10io/bananas', version: '1.3.7')

Packages::PackageFile.last.update!(file_name: '@10io/bananas-1.3.7.tgz', file_md5: '321da73828aea9fca18d4b9525408459', file: CarrierWaveStringFile.new_file(file_content: 'test', filename: '@10io/bananas-1.3.7.tgz', content_type: 'application/gzip'))

(note that we set the content of the file to test)

Let's try with a reporter:

$ curl --header "Authorization: Bearer <reporter pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
test

The file (well, the cache is returned) 🎉

Let's try with a maintainer:

$ curl --header "Authorization: Bearer <maintainer pat>" "http://gdk.test:8000/api/v4/projects/<project id>/dependency_proxy/packages/npm/@10io/bananas/-/@10io/bananas-1.3.7.tgz"
test

The file content is returned (test). Notice that the upload endpoint is not necessary because the cache is already filled so it's returned directly. 🎉

The code changes are behaving the way we want

Edited by David Fernandez

Merge request reports