Add RubyGems spec file generation worker

Summary

Implements RubyGems spec index file generation for the package registry.

This MR builds on the merged packages_rubygems_spec_files table/model/uploader by adding:

  • Packages::Rubygems::CreateSpecFilesService to generate specs.4.8.gz, latest_specs.4.8.gz, and prerelease_specs.4.8.gz
  • Packages::Rubygems::CreateSpecFilesWorker to regenerate spec files asynchronously per project
  • hooks from RubyGems package upload processing and package destruction flows
  • specs for service behavior, worker behavior, and enqueue hooks

Implementation Details

The service builds RubyGems-compatible spec index contents as gzip-compressed Marshal dumps of [name, version, platform] tuples.

  • specs.4.8.gz includes released versions only
  • latest_specs.4.8.gz includes the latest released version per gem name using Gem::Version comparison
  • prerelease_specs.4.8.gz includes prerelease versions only

The worker is idempotent and uses Sidekiq deduplication with deduplicate :until_executed, if_deduplicated: :reschedule_once so concurrent upload/delete events for the same project collapse into one active regeneration and one follow-up regeneration if needed.

Database

No migrations or schema changes — only new query patterns against existing tables (packages_packages, packages_rubygems_metadata, packages_rubygems_spec_files).

This MR adds:

  • a new scope installable_for_project on Packages::Rubygems::Package (for_projects(project).installable.has_version)
  • a find_or_build class method on Packages::Rubygems::SpecFile (wraps find_or_initialize_by), followed by update!

Affected code paths

  • app/models/packages/rubygems/package.rb — new installable_for_project scope and sync_rubygems_spec_files instance method
  • app/models/packages/rubygems/spec_file.rb — new find_or_build(project_id:, file_name:) class method
  • app/services/packages/rubygems/create_spec_files_service.rb#update_spec_fileSpecFile.find_or_build + update!

Queries

installable_for_project resolves installable to status IN (0, 1, 5) (default / hidden / deprecated) and has_version to version IS NOT NULL. The service reads name/version/platform in one left_joins(:rubygems_metadatum) + pluck, batched through each_batch — there is no separate metadata preload query.

Q1 — installable packages + metadata (batched LEFT JOIN)

The service reads name/version/platform via installable_for_project + left_joins(:rubygems_metadatum) + pluck, batched through each_batch. installable resolves to status IN (0, 1, 5) (default / hidden / deprecated) and has_version to version IS NOT NULL. There is no separate metadata preload query.

SELECT "packages_packages"."name", "packages_packages"."version", "packages_rubygems_metadata"."platform"
FROM "packages_packages"
LEFT OUTER JOIN "packages_rubygems_metadata"
  ON "packages_rubygems_metadata"."package_id" = "packages_packages"."id"
WHERE "packages_packages"."package_type" = 10
  AND "packages_packages"."project_id" = 278964
  AND "packages_packages"."status" IN (0, 1, 5)
  AND "packages_packages"."version" IS NOT NULL
ORDER BY "packages_packages"."id" ASC
LIMIT 1000;

Plan (seeded 1000 installable rubygems packages + matching metadata for the project, on DLE clone):

Limit  (cost=3.96..3.97 rows=1 width=59) (actual time=1.371..1.465 rows=1000 loops=1)
  Buffers: shared hit=54
  ->  Sort  (cost=3.96..3.97 rows=1 width=59) (actual time=1.370..1.408 rows=1000 loops=1)
        Sort Key: packages_packages.id
        Sort Method: quicksort  Memory: 95kB
        Buffers: shared hit=54
        ->  Merge Right Join  (cost=3.87..3.95 rows=1 width=59) (actual time=0.660..1.142 rows=1000 loops=1)
              Merge Cond: (packages_rubygems_metadata.package_id = packages_packages.id)
              Buffers: shared hit=54
              ->  Index Scan using packages_rubygems_metadata_pkey on public.packages_rubygems_metadata
                    (cost=0.28..60.29 rows=1001 width=13) (actual time=0.014..0.163 rows=1001 loops=1)
                    Buffers: shared hit=14
              ->  Sort  (cost=3.60..3.60 rows=1 width=54) (actual time=0.641..0.686 rows=1000 loops=1)
                    Sort Key: packages_packages.id
                    Sort Method: quicksort  Memory: 72kB
                    Buffers: shared hit=40
                    ->  Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages
                          (cost=0.56..3.59 rows=1 width=54) (actual time=0.062..0.456 rows=1000 loops=1)
                          Index Cond: ((packages_packages.project_id = 278964) AND (packages_packages.package_type = 10))
                          Filter: ((packages_packages.version IS NOT NULL) AND (packages_packages.status = ANY ('{0,1,5}'::integer[])))
                          Rows Removed by Filter: 0
                          Buffers: shared hit=40

Planning 3.620 ms, execution 1.598 ms, 54 shared buffer hits (0 reads). index_packages_packages_on_project_id_and_package_type narrows to one project + package type (status/version applied as a cheap filter, 0 rows removed); the rubygems metadatum is reached by PK (packages_rubygems_metadata_pkey) on the join. each_batch bounds each slice by id. Runs once per each_batch slice per worker execution, bounded by the number of installable rubygems packages in a single project.

Q2 — find_or_build SELECT

SELECT "packages_rubygems_spec_files".*
FROM "packages_rubygems_spec_files"
WHERE "packages_rubygems_spec_files"."project_id" = 900500
  AND "packages_rubygems_spec_files"."file_name" = 'specs.4.8.gz'
LIMIT 1;

Plan (seeded ~3000 rows, 1000 projects × 3 spec files, on DLE clone):

Limit  (cost=0.28..3.30 rows=1 width=127) (actual time=0.032..0.032 rows=1 loops=1)
  Buffers: shared hit=6
  ->  Index Scan using index_packages_rubygems_spec_files_on_project_id_and_file_name on public.packages_rubygems_spec_files
        (cost=0.28..3.30 rows=1 width=127) (actual time=0.030..0.031 rows=1 loops=1)
        Index Cond: ((project_id = 900500) AND (file_name = 'specs.4.8.gz'::text))
        Buffers: shared hit=6

Planning 0.787 ms, execution 0.079 ms.

Q3 — uniqueness validation on UPDATE

SELECT 1 AS one
FROM "packages_rubygems_spec_files"
WHERE "packages_rubygems_spec_files"."file_name" = 'specs.4.8.gz'
  AND "packages_rubygems_spec_files"."project_id" = 900500
  AND "packages_rubygems_spec_files"."id" != 999999999
LIMIT 1;

Plan:

Limit  (cost=0.28..3.30 rows=1 width=4) (actual time=0.033..0.033 rows=1 loops=1)
  Buffers: shared hit=6
  ->  Index Scan using index_packages_rubygems_spec_files_on_project_id_and_file_name on public.packages_rubygems_spec_files
        (cost=0.28..3.30 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=1)
        Index Cond: ((project_id = 900500) AND (file_name = 'specs.4.8.gz'::text))

Planning 0.696 ms, execution 0.068 ms. id <> N is a cheap residual filter on the single row returned by the index.

Q4 — update! (UPDATE path)

UPDATE "packages_rubygems_spec_files"
SET "file" = 'fake.gz', "size" = 100, "updated_at" = NOW()
WHERE "packages_rubygems_spec_files"."id" = 1;

Uses packages_rubygems_spec_files_pkey (PK on id) for a single-row lookup. The UPDATE doesn't modify columns covered by any index, so no index maintenance is required beyond the heap.

Q5 — update! (INSERT path, first sync for a project)

INSERT INTO "packages_rubygems_spec_files"
  (project_id, file_name, file, object_storage_key, size, file_store, status, created_at, updated_at)
VALUES (278964, 'specs.4.8.gz', 'fake.gz', 'fake-key', 100, 1, 0, NOW(), NOW());

Standard single-row INSERT — heap page + PK index + unique (project_id, file_name) index maintenance. Runs at most once per project, per spec file (the first time a rubygems package is published in that project).

Indexes backing these queries

Query Index
Q1 index_packages_packages_on_project_id_and_package_type (project_id, package_type) for the package scan; packages_rubygems_metadata_pkey (PK on package_id) for the join
Q2, Q3 index_packages_rubygems_spec_files_on_project_id_and_file_name UNIQUE (project_id, file_name)
Q4 packages_rubygems_spec_files_pkey (PK on id)
Q5 INSERT — maintains both spec-file indexes above

Volume / data distribution

Q1 runs once per each_batch slice per CreateSpecFilesWorker execution, bounded by the number of installable rubygems packages in a single project (typically tens, max a few thousand).

Q2–Q5 run 3 times per worker execution (one per spec file: specs, latest_specs, prerelease_specs). The worker is enqueued once per package mutation in a rubygems-publishing project.

Test Plan

bundle exec rspec \
  spec/services/packages/rubygems/create_spec_files_service_spec.rb \
  spec/workers/packages/rubygems/create_spec_files_worker_spec.rb \
  spec/services/packages/rubygems/process_gem_service_spec.rb \
  spec/services/packages/mark_package_for_destruction_service_spec.rb \
  spec/services/packages/mark_packages_for_destruction_service_spec.rb

bundle exec rubocop <touched files>

Functional Testing

  1. Start GDK and enable the RubyGems package registry for a test project.
  2. Build and push one or more .gem files to the project's RubyGems registry.
  3. After upload processing completes, verify three Packages::Rubygems::SpecFile records exist for the project: specs.4.8.gz, latest_specs.4.8.gz, prerelease_specs.4.8.gz.
  4. Upload multiple versions of the same gem, including a prerelease version such as 1.0.0.pre, and verify:
    • specs.4.8.gz contains only released versions
    • latest_specs.4.8.gz contains only the latest released version
    • prerelease_specs.4.8.gz contains only prerelease versions
  5. Delete or mark a RubyGems package for destruction and verify Packages::Rubygems::CreateSpecFilesWorker is enqueued for the project.
  6. Run the worker and verify the spec files are regenerated without the deleted package.

Since MR 3 (not started yet) serves these files through the API, this MR's functional check is mostly DB/worker verification rather than gem install against the endpoint.

Edited by Tim Rizzi

Merge request reports

Loading