Add RubyGems spec file generation worker
Summary
Implements RubyGems spec index file generation for the package registry.
This MR builds on the merged packages_rubygems_spec_files table/model/uploader by adding:
Packages::Rubygems::CreateSpecFilesServiceto generatespecs.4.8.gz,latest_specs.4.8.gz, andprerelease_specs.4.8.gzPackages::Rubygems::CreateSpecFilesWorkerto regenerate spec files asynchronously per project- hooks from RubyGems package upload processing and package destruction flows
- specs for service behavior, worker behavior, and enqueue hooks
Implementation Details
The service builds RubyGems-compatible spec index contents as gzip-compressed Marshal dumps of [name, version, platform] tuples.
specs.4.8.gzincludes released versions onlylatest_specs.4.8.gzincludes the latest released version per gem name usingGem::Versioncomparisonprerelease_specs.4.8.gzincludes prerelease versions only
The worker is idempotent and uses Sidekiq deduplication with deduplicate :until_executed, if_deduplicated: :reschedule_once so concurrent upload/delete events for the same project collapse into one active regeneration and one follow-up regeneration if needed.
Database
No migrations or schema changes — only new query patterns against existing tables (packages_packages, packages_rubygems_metadata, packages_rubygems_spec_files).
This MR adds:
- a new scope
installable_for_projectonPackages::Rubygems::Package(for_projects(project).installable.has_version) - a
find_or_buildclass method onPackages::Rubygems::SpecFile(wrapsfind_or_initialize_by), followed byupdate!
Affected code paths
app/models/packages/rubygems/package.rb— newinstallable_for_projectscope andsync_rubygems_spec_filesinstance methodapp/models/packages/rubygems/spec_file.rb— newfind_or_build(project_id:, file_name:)class methodapp/services/packages/rubygems/create_spec_files_service.rb#update_spec_file—SpecFile.find_or_build+update!
Queries
installable_for_project resolves installable to status IN (0, 1, 5) (default / hidden / deprecated) and has_version to version IS NOT NULL. The service reads name/version/platform in one left_joins(:rubygems_metadatum) + pluck, batched through each_batch — there is no separate metadata preload query.
Q1 — installable packages + metadata (batched LEFT JOIN)
The service reads name/version/platform via installable_for_project + left_joins(:rubygems_metadatum) + pluck, batched through each_batch. installable resolves to status IN (0, 1, 5) (default / hidden / deprecated) and has_version to version IS NOT NULL. There is no separate metadata preload query.
SELECT "packages_packages"."name", "packages_packages"."version", "packages_rubygems_metadata"."platform"
FROM "packages_packages"
LEFT OUTER JOIN "packages_rubygems_metadata"
ON "packages_rubygems_metadata"."package_id" = "packages_packages"."id"
WHERE "packages_packages"."package_type" = 10
AND "packages_packages"."project_id" = 278964
AND "packages_packages"."status" IN (0, 1, 5)
AND "packages_packages"."version" IS NOT NULL
ORDER BY "packages_packages"."id" ASC
LIMIT 1000;Plan (seeded 1000 installable rubygems packages + matching metadata for the project, on DLE clone):
Limit (cost=3.96..3.97 rows=1 width=59) (actual time=1.371..1.465 rows=1000 loops=1)
Buffers: shared hit=54
-> Sort (cost=3.96..3.97 rows=1 width=59) (actual time=1.370..1.408 rows=1000 loops=1)
Sort Key: packages_packages.id
Sort Method: quicksort Memory: 95kB
Buffers: shared hit=54
-> Merge Right Join (cost=3.87..3.95 rows=1 width=59) (actual time=0.660..1.142 rows=1000 loops=1)
Merge Cond: (packages_rubygems_metadata.package_id = packages_packages.id)
Buffers: shared hit=54
-> Index Scan using packages_rubygems_metadata_pkey on public.packages_rubygems_metadata
(cost=0.28..60.29 rows=1001 width=13) (actual time=0.014..0.163 rows=1001 loops=1)
Buffers: shared hit=14
-> Sort (cost=3.60..3.60 rows=1 width=54) (actual time=0.641..0.686 rows=1000 loops=1)
Sort Key: packages_packages.id
Sort Method: quicksort Memory: 72kB
Buffers: shared hit=40
-> Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages
(cost=0.56..3.59 rows=1 width=54) (actual time=0.062..0.456 rows=1000 loops=1)
Index Cond: ((packages_packages.project_id = 278964) AND (packages_packages.package_type = 10))
Filter: ((packages_packages.version IS NOT NULL) AND (packages_packages.status = ANY ('{0,1,5}'::integer[])))
Rows Removed by Filter: 0
Buffers: shared hit=40Planning 3.620 ms, execution 1.598 ms, 54 shared buffer hits (0 reads). index_packages_packages_on_project_id_and_package_type narrows to one project + package type (status/version applied as a cheap filter, 0 rows removed); the rubygems metadatum is reached by PK (packages_rubygems_metadata_pkey) on the join. each_batch bounds each slice by id. Runs once per each_batch slice per worker execution, bounded by the number of installable rubygems packages in a single project.
Q2 — find_or_build SELECT
SELECT "packages_rubygems_spec_files".*
FROM "packages_rubygems_spec_files"
WHERE "packages_rubygems_spec_files"."project_id" = 900500
AND "packages_rubygems_spec_files"."file_name" = 'specs.4.8.gz'
LIMIT 1;Plan (seeded ~3000 rows, 1000 projects × 3 spec files, on DLE clone):
Limit (cost=0.28..3.30 rows=1 width=127) (actual time=0.032..0.032 rows=1 loops=1)
Buffers: shared hit=6
-> Index Scan using index_packages_rubygems_spec_files_on_project_id_and_file_name on public.packages_rubygems_spec_files
(cost=0.28..3.30 rows=1 width=127) (actual time=0.030..0.031 rows=1 loops=1)
Index Cond: ((project_id = 900500) AND (file_name = 'specs.4.8.gz'::text))
Buffers: shared hit=6Planning 0.787 ms, execution 0.079 ms.
Q3 — uniqueness validation on UPDATE
SELECT 1 AS one
FROM "packages_rubygems_spec_files"
WHERE "packages_rubygems_spec_files"."file_name" = 'specs.4.8.gz'
AND "packages_rubygems_spec_files"."project_id" = 900500
AND "packages_rubygems_spec_files"."id" != 999999999
LIMIT 1;Plan:
Limit (cost=0.28..3.30 rows=1 width=4) (actual time=0.033..0.033 rows=1 loops=1)
Buffers: shared hit=6
-> Index Scan using index_packages_rubygems_spec_files_on_project_id_and_file_name on public.packages_rubygems_spec_files
(cost=0.28..3.30 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=1)
Index Cond: ((project_id = 900500) AND (file_name = 'specs.4.8.gz'::text))Planning 0.696 ms, execution 0.068 ms. id <> N is a cheap residual filter on the single row returned by the index.
Q4 — update! (UPDATE path)
UPDATE "packages_rubygems_spec_files"
SET "file" = 'fake.gz', "size" = 100, "updated_at" = NOW()
WHERE "packages_rubygems_spec_files"."id" = 1;Uses packages_rubygems_spec_files_pkey (PK on id) for a single-row lookup. The UPDATE doesn't modify columns covered by any index, so no index maintenance is required beyond the heap.
Q5 — update! (INSERT path, first sync for a project)
INSERT INTO "packages_rubygems_spec_files"
(project_id, file_name, file, object_storage_key, size, file_store, status, created_at, updated_at)
VALUES (278964, 'specs.4.8.gz', 'fake.gz', 'fake-key', 100, 1, 0, NOW(), NOW());Standard single-row INSERT — heap page + PK index + unique (project_id, file_name) index maintenance. Runs at most once per project, per spec file (the first time a rubygems package is published in that project).
Indexes backing these queries
| Query | Index |
|---|---|
| Q1 | index_packages_packages_on_project_id_and_package_type (project_id, package_type) for the package scan; packages_rubygems_metadata_pkey (PK on package_id) for the join |
| Q2, Q3 | index_packages_rubygems_spec_files_on_project_id_and_file_name UNIQUE (project_id, file_name) |
| Q4 | packages_rubygems_spec_files_pkey (PK on id) |
| Q5 | INSERT — maintains both spec-file indexes above |
Volume / data distribution
Q1 runs once per each_batch slice per CreateSpecFilesWorker execution, bounded by the number of installable rubygems packages in a single project (typically tens, max a few thousand).
Q2–Q5 run 3 times per worker execution (one per spec file: specs, latest_specs, prerelease_specs). The worker is enqueued once per package mutation in a rubygems-publishing project.
Test Plan
bundle exec rspec \
spec/services/packages/rubygems/create_spec_files_service_spec.rb \
spec/workers/packages/rubygems/create_spec_files_worker_spec.rb \
spec/services/packages/rubygems/process_gem_service_spec.rb \
spec/services/packages/mark_package_for_destruction_service_spec.rb \
spec/services/packages/mark_packages_for_destruction_service_spec.rb
bundle exec rubocop <touched files>Functional Testing
- Start GDK and enable the RubyGems package registry for a test project.
- Build and push one or more
.gemfiles to the project's RubyGems registry. - After upload processing completes, verify three
Packages::Rubygems::SpecFilerecords exist for the project:specs.4.8.gz,latest_specs.4.8.gz,prerelease_specs.4.8.gz. - Upload multiple versions of the same gem, including a prerelease version such as
1.0.0.pre, and verify:specs.4.8.gzcontains only released versionslatest_specs.4.8.gzcontains only the latest released versionprerelease_specs.4.8.gzcontains only prerelease versions
- Delete or mark a RubyGems package for destruction and verify
Packages::Rubygems::CreateSpecFilesWorkeris enqueued for the project. - Run the worker and verify the spec files are regenerated without the deleted package.
Since MR 3 (not started yet) serves these files through the API, this MR's functional check is mostly DB/worker verification rather than gem install against the endpoint.