Add Cargo sparse index endpoint

What does this MR do and why?

Adds GET /api/v4/projects/:id/packages/cargo/{prefix}/{name} — the Cargo registry sparse index. Returns newline-delimited JSON, one line per published version of a crate, most recently published first. This is what the Cargo CLI polls during cargo install and after cargo publish to resolve dependencies.

This is MR 2 of the Cargo MVC plan laid out in this issue comment. MR 1 (the download endpoint, !236631 (merged)) merged earlier this week. The remaining work is the upload authorize / upload publish endpoints + GA hardening.

What's in this MR

  • Four explicit routes for the four prefix shapes from the Cargo registry index spec — 1/{name}, 2/{name}, 3/{first}/{name}, {first-two}/{next-two}/{name} — declared inside the existing :id/packages/cargo namespace.
  • Packages::Cargo::MetadataFinder — collects installable Packages::Cargo::Metadatum rows for a project + normalized name, ordered by package_id DESC (most recently published first), capped at 500 versions. No pagination; the Cargo CLI doesn't support it.
  • New :read_cargo_package granular permission (mirrors :read_ruby_gem) plus its entry in the packages_and_registry/package assignable bundle so granular tokens scoped to read_package can use the new endpoint.
  • Everything stays behind the existing package_registry_cargo_support WIP feature flag (default off).

Design notes worth flagging

  • Route-ordering subtlety. The 4+ char route :prefix_1/:prefix_2/:package_name collides path-shape-wise with the existing :package_name/:package_version/download. I pinned each prefix segment to exactly two normalized-name characters (/[a-z0-9-]{2}/) so the index route can't shadow the download route — a download URL like my-crate/1.0.0/download won't match the index route since my-crate is 8 chars, while a real index URL like do/wn/download (sparse index for a crate literally named "download") matches the index route first as intended.
  • Response framing. The class sets default_format :json, which would JSON-encode any returned string. I added env['api.format'] = :binary alongside content_type 'text/plain' to pass the NDJSON body through verbatim. Same pattern the Debian distribution endpoint uses.
  • Ordering choice. package_id DESC rather than semver. The Cargo CLI doesn't care about order and package_id DESC ≈ reverse publish order, which keeps the most recently published versions when the 500 cap applies (matching the limit_recent convention used by Conan/NuGet/Helm). Publishing 1.5.0 after 2.0.0 yields [1.5.0, 2.0.0] — this is publish order, not semantic-version order.

References

Database

The sparse index finder returns one crate's versions within a project, ordered by publish order (package_id DESC — most recently published first) and capped at 500. Note this is publish order, not semantic-version order: publishing 1.5.0 after 2.0.0 yields [1.5.0, 2.0.0]. Ordering is descending so that when the 500 cap applies, the most recently published versions are kept.

Query:

SELECT packages_cargo_metadata.*
FROM packages_cargo_metadata
INNER JOIN packages_packages
  ON packages_packages.id = packages_cargo_metadata.package_id
WHERE packages_cargo_metadata.project_id = $1
  AND packages_cargo_metadata.normalized_name = $2
  AND packages_packages.package_type = 15        -- cargo
  AND packages_packages.status IN (0, 1, 5)      -- default, hidden, deprecated
ORDER BY packages_cargo_metadata.package_id DESC
LIMIT 500;

Plan (seeded with 600 versions for the target crate plus ~50k rows of noise across 5k crates): https://postgres.ai/console/gitlab/gitlab-production-main/sessions/52138/commands/153585

Plan
 Limit  (cost=2447.33..2447.33 rows=1 width=57) (actual time=2.020..2.071 rows=500 loops=1)
   Buffers: shared hit=3025
   ->  Sort  (cost=2446.74..2446.74 rows=1 width=57) (actual time=2.018..2.038 rows=500 loops=1)
         Sort Key: packages_cargo_metadata.package_id DESC
         Sort Method: quicksort  Memory: 104kB
         Buffers: shared hit=3025
         ->  Nested Loop  (cost=0.98..2446.73 rows=1 width=57) (actual time=0.143..1.770 rows=600 loops=1)
               Buffers: shared hit=3022
               ->  Index Scan using index_cargo_metadata_on_project_normalized_name_version on packages_cargo_metadata  (cost=0.41..294.98 rows=600 width=57) (actual time=0.118..0.248 rows=600 loops=1)
                     Index Cond: ((project_id = $1) AND (normalized_name = 'my-crate'::text))
                     Buffers: shared hit=22
               ->  Index Scan using packages_packages_pkey on packages_packages  (cost=0.56..3.59 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=600)
                     Index Cond: (id = packages_cargo_metadata.package_id)
                     Filter: ((package_type = 15) AND (status = ANY ('{0,1,5}'::integer[])))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=3000
 Execution Time: 2.226 ms

The ORDER BY adds a sort node, but it operates only on the rows matching (project_id, normalized_name) — one crate's versions, capped at 500 — not the whole table. With ~50k rows seeded the index still narrows to the 600 matching rows before the sort (quicksort, Memory: 104kB), so the sort input is bounded by versions-per-crate and does not grow with table size. Execution is ~2 ms with no disk reads.

How to set up and validate locally

  1. Enable the feature flag for a test project:

    Feature.enable(:package_registry_cargo_support, Project.find(<id>))
  2. Seed a couple of versions of a crate via console or factories (there's no upload endpoint yet in MR 2):

    project = Project.find(<id>)
    pkg1 = create(:cargo_package, name: 'my-crate', version: '1.0.0', project: project)
    pkg2 = create(:cargo_package, name: 'my-crate', version: '2.0.0', project: project)
    create(:cargo_metadatum, package: pkg1)
    create(:cargo_metadatum, package: pkg2)
  3. Hit the sparse index for the 4+ char prefix shape:

    curl -H 'Authorization: Bearer <PAT>' \
      http://gdk.test:3000/api/v4/projects/<id>/packages/cargo/my/-c/my-crate

    Expected: Content-Type: text/plain, body is two NDJSON lines (most recently published first), each parsing to an object matching the cargo_package_index_content schema.

  4. Smoke-check the other prefix shapes by seeding crates named a, ab, abc and hitting 1/a, 2/ab, 3/a/abc.

  5. Verify 404 for an unknown crate name; verify FF off → 404; verify private project + token without project access → 404.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist.

Edited by Tim Rizzi

Merge request reports

Loading