Modify Container Virtual Registry HandleFileRequestService

Context

This is a series of MRs to implement Container image virtual registry: push/pull end... (#549131)

The POC that integrates everything, and from which these MRs are being extracted from, is in !209585 (closed)

What does this MR do and why?

  • Modifies VirtualRegistries::Container::HandleFileRequestService to support searching for a cache entry by path or by digest.
  • Replaces the index on virtual_registries_container_cache_entries(group_id) with an index on `virtual_registries_container_cache_entries(group_id, upstream_etag) to support the new join query needed by the service

Database Review

db:gitlabcom-database-testing output

Search by tag
SELECT
    "virtual_registries_container_cache_entries".*
FROM
    "virtual_registries_container_cache_entries"
WHERE
    "virtual_registries_container_cache_entries"."status" = 0
    AND "virtual_registries_container_cache_entries"."upstream_id" IN (
        SELECT
            "virtual_registries_container_upstreams"."id"
        FROM
            "virtual_registries_container_upstreams"
            INNER JOIN "virtual_registries_container_registry_upstreams" ON "virtual_registries_container_upstreams"."id" = "virtual_registries_container_registry_upstreams"."upstream_id"
        WHERE
            "virtual_registries_container_registry_upstreams"."registry_id" = 5
        ORDER BY
            "virtual_registries_container_registry_upstreams"."position" ASC)
        AND "virtual_registries_container_cache_entries"."group_id" = 22
        AND "virtual_registries_container_cache_entries"."relative_path" = 'hello-world/manifests/latest'
    LIMIT 1

Query plan from local GDK

 Limit  (cost=0.30..13.20 rows=1 width=290) (actual time=0.034..0.035 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=0.30..13.20 rows=1 width=290) (actual time=0.033..0.033 rows=1 loops=1)
         Join Filter: (virtual_registries_container_upstreams.id = virtual_registries_container_cache_entries.upstream_id)
         ->  Seq Scan on virtual_registries_container_cache_entries_00 virtual_registries_container_cache_entries  (cost=0.00..1.02 rows=1 width=290) (actual time=0.016..0.016 rows=1 loops=1)
               Filter: ((status = 0) AND (group_id = 22) AND (relative_path = 'hello-world/manifests/latest'::text))
         ->  Nested Loop  (cost=0.30..12.07 rows=5 width=10) (actual time=0.014..0.014 rows=1 loops=1)
               ->  Index Scan using constraint_vreg_container_reg_upst_on_unique_reg_pos on virtual_registries_container_registry_upstreams  (cost=0.15..5.24 rows=5 width=10) (actual time=0.006..0.006 rows=1 loops=1)
                     Index Cond: (registry_id = 5)
               ->  Index Only Scan using virtual_registries_container_upstreams_pkey on virtual_registries_container_upstreams  (cost=0.15..1.36 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                     Index Cond: (id = virtual_registries_container_registry_upstreams.upstream_id)
                     Heap Fetches: 1
 Planning Time: 15.141 ms
 Execution Time: 0.069 ms
(13 rows)
Search by digest
SELECT
    "virtual_registries_container_cache_entries".*
FROM
    "virtual_registries_container_cache_entries"
WHERE
    "virtual_registries_container_cache_entries"."status" = 0
    AND "virtual_registries_container_cache_entries"."upstream_id" IN (
        SELECT
            "virtual_registries_container_upstreams"."id"
        FROM
            "virtual_registries_container_upstreams"
            INNER JOIN "virtual_registries_container_registry_upstreams" ON "virtual_registries_container_upstreams"."id" = "virtual_registries_container_registry_upstreams"."upstream_id"
        WHERE
            "virtual_registries_container_registry_upstreams"."registry_id" = 5
        ORDER BY
            "virtual_registries_container_registry_upstreams"."position" ASC)
        AND "virtual_registries_container_cache_entries"."group_id" = 22
        AND "virtual_registries_container_cache_entries"."upstream_etag" = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'
    LIMIT 1

Query plan from local GDK, with new index

 Limit  (cost=0.30..7.76 rows=1 width=290) (actual time=0.088..0.090 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=0.30..37.59 rows=5 width=290) (actual time=0.087..0.088 rows=1 loops=1)
         Join Filter: ("ANY_subquery".id = virtual_registries_container_cache_entries.upstream_id)
         ->  Append  (cost=0.00..24.26 rows=16 width=290) (actual time=0.019..0.020 rows=1 loops=1)
               ->  Seq Scan on virtual_registries_container_cache_entries_00 virtual_registries_container_cache_entries_1  (cost=0.00..1.02 rows=1 width=290) (actual time=0.018..0.019 rows=1 loops=1)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_f48efb3442 on virtual_registries_container_cache_entries_01 virtual_registries_container_cache_entries_2  (cost=0.12..2.15 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_02 virtual_registries_container_cache_entries_3  (cost=0.00..1.02 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_03 virtual_registries_container_cache_entries_4  (cost=0.00..1.03 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_b88fe7a7c4 on virtual_registries_container_cache_entries_04 virtual_registries_container_cache_entries_5  (cost=0.12..2.15 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_05 virtual_registries_container_cache_entries_6  (cost=0.00..1.02 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_06 virtual_registries_container_cache_entries_7  (cost=0.00..1.00 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_6c08054aaa on virtual_registries_container_cache_entries_07 virtual_registries_container_cache_entries_8  (cost=0.12..2.15 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_c676c698e9 on virtual_registries_container_cache_entries_08 virtual_registries_container_cache_entries_9  (cost=0.14..2.17 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_08b889e694 on virtual_registries_container_cache_entries_09 virtual_registries_container_cache_entries_10  (cost=0.12..2.15 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_b1835b5295 on virtual_registries_container_cache_entries_10 virtual_registries_container_cache_entries_11  (cost=0.12..2.15 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_11 virtual_registries_container_cache_entries_12  (cost=0.00..1.02 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_12 virtual_registries_container_cache_entries_13  (cost=0.00..1.02 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_13 virtual_registries_container_cache_entries_14  (cost=0.00..1.00 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Index Scan using index_27371c023a on virtual_registries_container_cache_entries_14 virtual_registries_container_cache_entries_15  (cost=0.14..2.17 rows=1 width=290) (never executed)
                     Filter: ((group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
               ->  Seq Scan on virtual_registries_container_cache_entries_15 virtual_registries_container_cache_entries_16  (cost=0.00..1.00 rows=1 width=290) (never executed)
                     Filter: ((status = 0) AND (group_id = 22) AND (upstream_etag = 'sha256:0c0473b2781ff136160d27c53706e6e593b0a7ded422170058d17101a5b92ff5'::text))
         ->  Materialize  (cost=0.30..12.15 rows=5 width=8) (actual time=0.064..0.065 rows=1 loops=1)
               ->  Subquery Scan on "ANY_subquery"  (cost=0.30..12.12 rows=5 width=8) (actual time=0.045..0.045 rows=1 loops=1)
                     ->  Nested Loop  (cost=0.30..12.07 rows=5 width=10) (actual time=0.044..0.045 rows=1 loops=1)
                           ->  Index Scan using constraint_vreg_container_reg_upst_on_unique_reg_pos on virtual_registries_container_registry_upstreams  (cost=0.15..5.24 rows=5 width=10) (actual time=0.014..0.014 rows=1 loops=1)
                                 Index Cond: (registry_id = 5)
                           ->  Index Only Scan using virtual_registries_container_upstreams_pkey on virtual_registries_container_upstreams  (cost=0.15..1.36 rows=1 width=8) (actual time=0.028..0.028 rows=1 loops=1)
                                 Index Cond: (id = virtual_registries_container_registry_upstreams.upstream_id)
                                 Heap Fetches: 1
 Planning Time: 4.644 ms
 Execution Time: 0.423 ms
(46 rows)

References

Screenshots or screen recordings

NA

How to set up and validate locally

The HandleFileRequestService is not yet used by any endpoints. We can verify that it loads a cache entry by either path or digest by creating a cache entry record, and then testing the service from the Rails console.

# Get a handle to an upstream
registry = VirtualRegistries::Container::Registry.last
upstream = registry.upstreams.first

# Create a temporary file for the cache entry
temp_file = Tempfile.new(['cache_entry', '.txt'])
temp_file.write('some test content for the cache entry')
temp_file.rewind

# Create a cache entry. Note the values for upstream_etag and relative_path
upstream_etag = "sha256:#{'a' * 64}"
upstream.cache_entries.create(
  upstream_etag: "#{upstream_etag}",
  upstream_checked_at: Time.zone.now,
  content_type: 'text/plain',
  file_md5: 'd8e8fca2dc0f896fd7cb4cb0031ba249',
  file_sha1: '4e1243bd22c66e76c2ba9eddc1f91394e57f9f83',
  group: upstream.group,
  relative_path: "hello-world/manifests/latest",
  size: 1.kilobyte,
  file: temp_file
)

current_user = User.first

# Search by path
search_term = 'hello-world/manifests/latest'
VirtualRegistries::Container::HandleFileRequestService.new(registry: registry, current_user: current_user, params: { path: search_term }).execute

# Search by a path that does not match a record
search_term = 'hello-world/manifests/latestzzz'
VirtualRegistries::Container::HandleFileRequestService.new(registry: registry, current_user: current_user, params: { path: search_term }).execute

# Search by digest (upstream_etag)
search_term = "hello-world/manifests/#{upstream_etag}"
VirtualRegistries::Container::HandleFileRequestService.new(registry: registry, current_user: current_user, params: { path: search_term }).execute

# Search by a digest that does not match a record
search_term = "hello-world/manifests/sha256:#{'b' * 64}"
VirtualRegistries::Container::HandleFileRequestService.new(registry: registry, current_user: current_user, params: { path: search_term }).execute

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #549131

Edited by Radamanthus Batnag

Merge request reports

Loading