Skip to content

Store dependency proxy manifest content-type

🏛 Context

The Dependency Proxy allows users to pull images from DockerHub through GitLab. GitLab then stores these images at the group level and will use the cached image the next time a user pulls an image.

We have two problems to fix here:

Problem 1: headers

When we pull an image, two different files are downloaded: a manifest, and blob files. The manifest request uses the Accept header to determine which format of the manifest it should return. Then the response includes a Content-Type header describing what format of manifest was returned. Currently, the Dependency Proxy does not specify any Accept headers when pulling from DockerHub, and does not set any Content-Type headers when sending the manifest to the user's local client. In Docker version 20.x, the CLI has been updated to set default headers, which GitLab ignores. This leads to a mismatch, causing the CLI client to reject the incoming manifest.

In order to fix this, we need to save the headers being returned when GitLab pulls the manifest from DockerHub, and then set those same headers when passing the manifest down to the user. In other words, we need to proxy the headers as well as the file.

Problem 2: 'pull-by-digest'

A second problem is the new version of Docker expects to be able to request images by tag name

docker pull alpine:latest

as well as being able to pull by digest, where the digest is the specific sha value that a given tag might be pointing to:

docker pull alpine@sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515

If the digest is currently pointing to that tag, then when we look for a matching manifest, both of those commands should return the same file from the same dependency_proxy_manifests record.

The problem is, right now, GitLab only looks up these records by file_name. This means when requested by digest, no matching manifest will be found and we will pull it from DockerHub even though we might already have it in the database.

To fix this we need to update our searching logic to look for the manifest by file_name, and if not found, look for it by digest.

🔎 What does this MR do?

  1. Adds content_type column to dependency_proxy_manifests to store the content type of the cached manifest file.
  2. Adds a new index index_dependency_proxy_manifests_on_group_id_and_digest to allow faster searching by digest when a request is a "pull-by-digest".
  3. Updates the Dependency Proxy controller to return the response headers that match what DockerHub returned with the manifest
  4. Updates the Accept headers used when making requests to DockerHub
  5. Updates the manifest services to save the content-type to the new column when creating and updating manifest records.

🐘 Database

Migrations

Up Migrations
== 20210128140157 AddContentTypeToDependencyProxyManifests: migrating =========
-- add_column(:dependency_proxy_manifests, :content_type, :text)
   -> 0.0022s
== 20210128140157 AddContentTypeToDependencyProxyManifests: migrated (0.0023s)

== 20210128140232 AddTextLimitToDependencyProxyManifestsContentType: migrating -- transaction_open?() -> 0.0000s -- current_schema() -> 0.0003s -- execute("ALTER TABLE dependency_proxy_manifests\nADD CONSTRAINT check_167a9a8a91\nCHECK ( char_length(content_type) <= 255 )\nNOT VALID;\n") -> 0.0014s -- current_schema() -> 0.0003s -- execute("SET statement_timeout TO 0") -> 0.0003s -- execute("ALTER TABLE dependency_proxy_manifests VALIDATE CONSTRAINT check_167a9a8a91;") -> 0.0018s -- execute("RESET ALL") -> 0.0003s == 20210128140232 AddTextLimitToDependencyProxyManifestsContentType: migrated (0.0143s)

== 20210128155939 AddIndexDependencyProxyManifestsOnGroupIdAndDigest: migrating -- transaction_open?() -> 0.0000s -- index_exists?(:dependency_proxy_manifests, [:group_id, :digest], {:name=>"index_dependency_proxy_manifests_on_group_id_and_digest", :algorithm=>:concurrently}) -> 0.0033s -- add_index(:dependency_proxy_manifests, [:group_id, :digest], {:name=>"index_dependency_proxy_manifests_on_group_id_and_digest", :algorithm=>:concurrently}) -> 0.0052s == 20210128155939 AddIndexDependencyProxyManifestsOnGroupIdAndDigest: migrated (0.0091s)

Down Migrations
== 20210128155939 AddIndexDependencyProxyManifestsOnGroupIdAndDigest: reverting
-- transaction_open?()
   -> 0.0000s
-- indexes(:dependency_proxy_manifests)
   -> 0.0026s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- remove_index(:dependency_proxy_manifests, {:algorithm=>:concurrently, :name=>"index_dependency_proxy_manifests_on_group_id_and_digest"})
   -> 0.0061s
-- execute("RESET ALL")
   -> 0.0002s
== 20210128155939 AddIndexDependencyProxyManifestsOnGroupIdAndDigest: reverted (0.0096s)

== 20210128140232 AddTextLimitToDependencyProxyManifestsContentType: reverting -- execute("ALTER TABLE dependency_proxy_manifests\nDROP CONSTRAINT IF EXISTS check_167a9a8a91\n") -> 0.0012s == 20210128140232 AddTextLimitToDependencyProxyManifestsContentType: reverted (0.0085s)

== 20210128140157 AddContentTypeToDependencyProxyManifests: reverting ========= -- remove_column(:dependency_proxy_manifests, :content_type, :text) -> 0.0006s == 20210128140157 AddContentTypeToDependencyProxyManifests: reverted (0.0042s)

Queries

.find_or_initialize_by_file_name_or_digest

This class method will perform one or two queries depending if the first one finds a manifest. Both are run in the context of a group: @group.dependency_proxy_manifests.find_or_initialize_by_file_name_or_digest(file_name: 'foo', digest: 'bar')

1st query: find_by(file_name: file_name)

SELECT "dependency_proxy_manifests".* 
FROM "dependency_proxy_manifests" 
WHERE "dependency_proxy_manifests"."group_id" = 114 
    AND "dependency_proxy_manifests"."file_name" = 'alpine:latest.json'

Explain: https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/1882/commands/6374

2nd query: find_by(digest: digest)

SELECT "dependency_proxy_manifests".* 
FROM "dependency_proxy_manifests" 
WHERE "dependency_proxy_manifests"."group_id" = 114 
    AND "dependency_proxy_manifests"."digest" = 'sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515'

Explain: https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/1882/commands/6376

Index creation

In #database-lab:

exec CREATE INDEX index_dependency_proxy_manifests_on_group_id_and_digest ON dependency_proxy_manifests USING btree (group_id, digest);
The query has been executed. Duration: 77.000 ms

📸 Screenshots (strongly suggested)

Using Docker 20.10

Before Updates:

$ docker pull gdk.test:3001/my-private-group/dependency_proxy/containers/alpine:latest

Error response from daemon: received unexpected HTTP status: 500 Internal Server Error

After Updates:

$ docker pull gdk.test:3001/my-private-group/dependency_proxy/containers/alpine@sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515

gdk.test:3001/my-private-group/dependency_proxy/containers/alpine@sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515: Pulling from my-private-group/dependency_proxy/containers/alpine
596ba82af5aa: Pull complete
Digest: sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515
Status: Downloaded newer image for gdk.test:3001/my-private-group/dependency_proxy/containers/alpine@sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515
gdk.test:3001/my-private-group/dependency_proxy/containers/alpine@sha256:d0710affa17fad5f466a70159cc458227bd25d4afb39514ef662ead3e6c99515

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #290944 (closed), #299566 (closed)

Edited by Steve Abrams

Merge request reports

Loading