Skip to content

Dependency Proxy uses workhorse for manifest pulls

Steve Abrams requested to merge 335560-dp-manifest-workhorse-uploads into master

🐠 Context

The Dependency Proxy acts as a pull through cache for Docker Hub images. Images are made of two types of files: blobs and manifests. The process to download and store these files is being moved from Rails to workhorse.

!71890 (merged) moved the logic for pulling blob files to workhorse and built out the mechanisms necessary to make such requests. This MR moves the manifest downloads to workhorse.

Here is the sequence of how these files are fetched and cached (stored):

    Client->>Workhorse: GET /v2/*group_id/dependency_proxy/containers/*image/manifests/*tag
    Workhorse->>Rails: GET /v2/*group_id/dependency_proxy/containers/*image/manifests/*tag 
    Rails->>Rails: Check DB. Is manifest persisted in cache?
    alt In Cache
        Rails->>Workhorse: Respond with send-url injector
        Workhorse->>Client: Send the file to the client
    else Not In Cache
        Rails->>Rails: Generate auth token and download URL for the manifest in upstream registry
        Rails->>Workhorse: Respond with send-dependency injector
        Workhorse->>External Registry: Request the manifest
        Container Registry->>Workhorse: Download the manifest
        Workhorse->>Rails: GET /v2/*group_id/dependency_proxy/containers/*image/manifest/*tag/authorize
        Rails->>Workhorse: Respond with upload instructions
        Workhorse->>Client: Send the manifest file to the client with original headers
        Workhorse->>Object Storage: Save the manifest file with some of it's header values
        Workhorse->>Rails: Finalize the upload

(Thanks to @igor.drozdov for creating this fantastic sequence diagram)

🐙 What does this MR do and why?

  1. We introduce these features behind a feature flag: dependency_proxy_manifest_workhorse. Rollout issue: #344216 (closed)
  2. We add two new routes. These routes handle the workhorse accelarated upload of the manifest files that are being pulled from the external registry (DockerHub).
  3. We update the workhorse code to pass the headers received from the outside registry to both the user as well as rails when it receives the file. When dealing with manifests, it is important to preserve these headers because the Docker client expects them and Rails stores them so we can provide them to the Docker client when we serve a cached manifest.
  4. Note that much of the logic being implemented in the controller in manifest_via_workhorse comes from the DependencyProxy::FindOrCreateManifestService which will be removed after this feature flag is rolled out.

🐘 Database

This MR adds a new class method DependencyProxy::Manifest.find_by_file_name_or_digest. This query is extracted from the existing DependencyProxy::Manifest.find_or_initialize_by_file_name_or_digest and the query itself does not change, so I did not think this warranted a database review. I am happy to include a review if it is deemed necessary.

🎬 : Screenshots or screen recordings

These changes happen on the backend, so there is not much to be seen outside of the logs, but this is what a successful image pull looks like using this feature:

→ docker pull gdk.test:3001/asdfasdfasdf/dependency_proxy/containers/alpine:latest
latest: Pulling from asdfasdfasdf/dependency_proxy/containers/alpine
Digest: sha256:69704ef328d05a9f806b6b8502915e6a0a4faa4d72018dc42343f511490daf8a
Status: Image is up to date for gdk.test:3001/asdfasdfasdf/dependency_proxy/containers/alpine:latest

💻 How to set up and validate locally

  1. Follow these docs to set up the Dependency Proxy on your GDK.
  2. Apply the workhorse updates by running make gitlab-workhorse-setup && gdk restart gitlab-workhorse in your GDK root directory.
  3. Create a group and navigate to Packages & Registries -> Dependency Proxy to find the image prefix.
  4. Log into the Dependency Proxy using a PAT:
    docker login gdk.test:3000
    username: root
    password: <personal_access_token>
  5. Enable the feature flag in the rails console:
  6. Start viewing rails logs
  7. Pull an image through the dependency proxy:
    # use your image prefix, it should look like
    docker pull gdk.test:3000/<group_path>/dependency_proxy/containers/alpine:latest
  8. The image pull should be successful.
  9. Looking at the rails logs, you should see requests for:
    Started POST "/v2/<group_path>/dependency_proxy/containers/alpine/manifests/latest/upload/authorize"
    Started POST "/v2/<group_path/dependency_proxy/containers/alpine/manifests/latest/upload"
    These will not get called if workhorse was not processing the pull and upload. You should also see the manifest record inserted.
  10. Use docker images to find the IMAGE ID of the image you pulled and then remove it from your local machine's cache:
    docker rmi -f 14119a10abf4
  11. Pull the image again. This time in the rails logs you should not see the two /upload requests since the manifest will be pulled from the cache. You also should not see any manifests being inserted, but you should see an UPDATE for the existing one.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #335560 (closed)

Edited by Steve Abrams

Merge request reports