Use metadata when handling duplicate conan package files
What does this MR do and why?
In Extract Conan (C/C++ Package Manager) metadata ... (&14896 - closed) we have implemented the package revisions for conan packages https://docs.gitlab.com/user/packages/conan_1_repository/#conan-revisions.
Conan package revisions are unique identifiers that track different versions or builds of a compiled binary package, allowing you to distinguish between packages built with different compiler settings, dependencies, or source code changes even when they share the same version number.
We need to take this information into account when we cleanup duplicate packages using cleanup policies https://docs.gitlab.com/user/packages/package_registry/reduce_package_registry_storage/#cleanup-policy
This MR introduces the changes to Packages::Cleanup::ExecutePolicyService
to handle duplicate conan package files differently: by taking into account the file metadata containing revision and reference information. In order to avoid any unexpected
References
`keep_n_duplicated_package_files` cleanup polic... (#536184)
Screenshots or screen recordings
No.
Database analysis
Packages::PackageFile#conan_keep_n_duplicate_ids
postgres.ai setup
We need to prepare duplicate package files linked to the same revisions. I've taken the existing conan package https://gitlab.com/dmeshcharakou/packages/-/packages/43689164 and added more duplicate files to it.
--- Create 2000 package files
exec INSERT INTO packages_package_files (created_at, updated_at, project_id, package_id, file_name, file)
SELECT
NOW(),
NOW(),
44848776,
43689164,
'conanmanifest.txt',
'conanmanifest.txt'
FROM generate_series(1, 2000) AS i;
--- Create 1000 conan file metadata for recipe files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id)
SELECT
NOW(),
NOW(),
44848776,
packages_package_files.id,
1,
1000975
FROM packages_package_files ORDER BY id DESC LIMIT 1000;
---- Create 1000 conan file metadata for package files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id, package_revision_id, package_reference_id)
SELECT
NOW(),
NOW(),
44848776,
packages_package_files.id,
2,
1000975,
1000598,
3533517
FROM packages_package_files ORDER BY id DESC LIMIT 1000 OFFSET 1000;
Query
WITH "ranked_files" AS MATERIALIZED
(SELECT "packages_package_files"."id", ROW_NUMBER() OVER (PARTITION BY "packages_conan_file_metadata"."conan_file_type", "packages_conan_file_metadata"."recipe_revision_id", "packages_conan_file_metadata"."package_revision_id", "packages_conan_file_metadata"."package_reference_id"
ORDER BY "packages_package_files"."created_at" DESC) AS rn
FROM "packages_package_files"
INNER JOIN "packages_conan_file_metadata" ON "packages_conan_file_metadata"."package_file_id" = "packages_package_files"."id"
WHERE "packages_package_files"."status" = 0
AND "packages_package_files"."package_id" = XXX
AND "packages_package_files"."file_name" = 'XXX')
SELECT id
FROM "ranked_files"
WHERE "ranked_files"."rn" <= 'XXX';
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/43687/commands/133437
I've tried with the index ON packages_conan_file_metadata (conan_file_type, recipe_revision_id, package_revision_id, package_reference_id)
but the optimizer refuses to use it.
Packages::Cleanup::ExecutePolicyService#unique_package_id_and_file_name_from
postgres.ai setup
We need to prepare duplicate package files linked to the same revisions. I've taken the existing conan package https://gitlab.com/dmeshcharakou/packages/-/packages/43689164 and added more duplicate files to it.
--- Create 2000 package files
exec INSERT INTO packages_package_files (created_at, updated_at, project_id, package_id, file_name, file)
SELECT
NOW(),
NOW(),
44848776,
43689164,
'conanmanifest.txt',
'conanmanifest.txt'
FROM generate_series(1, 2000) AS i;
--- Create 1000 conan file metadata for recipe files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id)
SELECT
NOW(),
NOW(),
44848776,
packages_package_files.id,
1,
1000975
FROM packages_package_files ORDER BY id DESC LIMIT 1000;
---- Create 1000 conan file metadata for package files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id, package_revision_id, package_reference_id)
SELECT
NOW(),
NOW(),
44848776,
packages_package_files.id,
2,
1000975,
1000598,
3533517
FROM packages_package_files ORDER BY id DESC LIMIT 1000 OFFSET 1000;
Query
SELECT "packages_package_files"."package_id",
"packages_package_files"."file_name",
"packages_packages"."package_type"
FROM "packages_package_files"
INNER JOIN "packages_packages" ON "packages_packages"."id" = "packages_package_files"."package_id"
WHERE "packages_package_files"."status" = 0
AND "packages_package_files"."project_id" = XXX
AND "packages_package_files"."id" >= XXX
GROUP BY "packages_package_files"."package_id",
"packages_package_files"."file_name",
"packages_packages"."package_type"
HAVING (COUNT(*) > 1);
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/43590/commands/133189
How to set up and validate locally
-
Enable the feature flag
Feature.enable(:packages_conan_duplicates_cleanup_policy)
-
Set cleanup policy for the project to keep only
1
duplicate package file or modify existing one.project = Project.first FactoryBot.create(:packages_cleanup_policy, project: project, keep_n_duplicated_package_files: '1')
-
Create a new conan package with several duplicate files
def fixture_file_upload(*args, **kwargs) Rack::Test::UploadedFile.new(*args, **kwargs) end package = FactoryBot.create(:conan_package, project: project) # it will create two `conanmanifest.txt` files (one for recipe revision and one for package revision) # Create a few more duplicate files 3.times do FactoryBot.create(:conan_package_file, :conan_recipe_manifest, package: package) FactoryBot.create(:conan_package_file, :conan_package_manifest, package: package) end # Verify how many files were created package.reload.package_files.installable.where(file_name: 'conanmanifest.txt') # should return 8
-
Call on the service and verify package files.
Packages::Cleanup::ExecutePolicyService.new(project.packages_cleanup_policy).execute # Verify how many installable files the package has package.reload.package_files.installable.where(file_name: 'conanmanifest.txt') # should return 2 # Verify how many files were marked for destruction package.reload.package_files.pending_destruction # should return 6
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #536184