Skip to content

Use metadata when handling duplicate conan package files

What does this MR do and why?

In Extract Conan (C/C++ Package Manager) metadata ... (&14896 - closed) we have implemented the package revisions for conan packages https://docs.gitlab.com/user/packages/conan_1_repository/#conan-revisions.

Conan package revisions are unique identifiers that track different versions or builds of a compiled binary package, allowing you to distinguish between packages built with different compiler settings, dependencies, or source code changes even when they share the same version number.

We need to take this information into account when we cleanup duplicate packages using cleanup policies https://docs.gitlab.com/user/packages/package_registry/reduce_package_registry_storage/#cleanup-policy

This MR introduces the changes to Packages::Cleanup::ExecutePolicyService to handle duplicate conan package files differently: by taking into account the file metadata containing revision and reference information. In order to avoid any unexpected 💥 the changes are implemented with the feature flag.

References

`keep_n_duplicated_package_files` cleanup polic... (#536184)

Screenshots or screen recordings

No.

Database analysis

Packages::PackageFile#conan_keep_n_duplicate_ids

postgres.ai setup

We need to prepare duplicate package files linked to the same revisions. I've taken the existing conan package https://gitlab.com/dmeshcharakou/packages/-/packages/43689164 and added more duplicate files to it.

--- Create 2000 package files
exec INSERT INTO packages_package_files (created_at, updated_at, project_id, package_id, file_name, file)
SELECT 
    NOW(),
    NOW(),
    44848776,
    43689164,
    'conanmanifest.txt',
    'conanmanifest.txt'
FROM generate_series(1, 2000) AS i;

--- Create 1000 conan file metadata for recipe files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id)
SELECT
    NOW(),
    NOW(),
    44848776,
    packages_package_files.id,
    1,
    1000975
FROM packages_package_files ORDER BY id DESC LIMIT 1000;

---- Create 1000 conan file metadata for package files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id, package_revision_id, package_reference_id)
SELECT
    NOW(),
    NOW(),
    44848776,
    packages_package_files.id,
    2,
    1000975,
    1000598,
    3533517
FROM packages_package_files ORDER BY id DESC LIMIT 1000 OFFSET 1000;
Query
WITH "ranked_files" AS MATERIALIZED
  (SELECT "packages_package_files"."id", ROW_NUMBER() OVER (PARTITION BY "packages_conan_file_metadata"."conan_file_type", "packages_conan_file_metadata"."recipe_revision_id", "packages_conan_file_metadata"."package_revision_id", "packages_conan_file_metadata"."package_reference_id"
                                                            ORDER BY "packages_package_files"."created_at" DESC) AS rn
   FROM "packages_package_files"
   INNER JOIN "packages_conan_file_metadata" ON "packages_conan_file_metadata"."package_file_id" = "packages_package_files"."id"
   WHERE "packages_package_files"."status" = 0
     AND "packages_package_files"."package_id" = XXX
     AND "packages_package_files"."file_name" = 'XXX')
SELECT id
FROM "ranked_files"
WHERE "ranked_files"."rn" <= 'XXX';

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/43687/commands/133437

I've tried with the index ON packages_conan_file_metadata (conan_file_type, recipe_revision_id, package_revision_id, package_reference_id) but the optimizer refuses to use it.

Packages::Cleanup::ExecutePolicyService#unique_package_id_and_file_name_from

postgres.ai setup

We need to prepare duplicate package files linked to the same revisions. I've taken the existing conan package https://gitlab.com/dmeshcharakou/packages/-/packages/43689164 and added more duplicate files to it.

--- Create 2000 package files
exec INSERT INTO packages_package_files (created_at, updated_at, project_id, package_id, file_name, file)
SELECT 
    NOW(),
    NOW(),
    44848776,
    43689164,
    'conanmanifest.txt',
    'conanmanifest.txt'
FROM generate_series(1, 2000) AS i;

--- Create 1000 conan file metadata for recipe files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id)
SELECT
    NOW(),
    NOW(),
    44848776,
    packages_package_files.id,
    1,
    1000975
FROM packages_package_files ORDER BY id DESC LIMIT 1000;

---- Create 1000 conan file metadata for package files
exec INSERT INTO packages_conan_file_metadata (created_at, updated_at, project_id, package_file_id, conan_file_type, recipe_revision_id, package_revision_id, package_reference_id)
SELECT
    NOW(),
    NOW(),
    44848776,
    packages_package_files.id,
    2,
    1000975,
    1000598,
    3533517
FROM packages_package_files ORDER BY id DESC LIMIT 1000 OFFSET 1000;
Query
SELECT "packages_package_files"."package_id",
       "packages_package_files"."file_name",
       "packages_packages"."package_type"
FROM "packages_package_files"
INNER JOIN "packages_packages" ON "packages_packages"."id" = "packages_package_files"."package_id"
WHERE "packages_package_files"."status" = 0
  AND "packages_package_files"."project_id" = XXX
  AND "packages_package_files"."id" >= XXX
GROUP BY "packages_package_files"."package_id",
         "packages_package_files"."file_name",
         "packages_packages"."package_type"
HAVING (COUNT(*) > 1);

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/43590/commands/133189

How to set up and validate locally

  1. Enable the feature flag

    Feature.enable(:packages_conan_duplicates_cleanup_policy)
  2. Set cleanup policy for the project to keep only 1 duplicate package file or modify existing one.

    project = Project.first
    
    FactoryBot.create(:packages_cleanup_policy, project: project, keep_n_duplicated_package_files: '1')
  3. Create a new conan package with several duplicate files

    def fixture_file_upload(*args, **kwargs)
      Rack::Test::UploadedFile.new(*args, **kwargs)
    end
    
    package = FactoryBot.create(:conan_package, project: project) # it will create two `conanmanifest.txt` files (one for recipe revision and one for package revision)
    
    # Create a few more duplicate files
    3.times do
      FactoryBot.create(:conan_package_file, :conan_recipe_manifest, package: package)
      FactoryBot.create(:conan_package_file, :conan_package_manifest, package: package)
    end
    
    # Verify how many files were created
    package.reload.package_files.installable.where(file_name: 'conanmanifest.txt') # should return 8
  4. Call on the service and verify package files.

    Packages::Cleanup::ExecutePolicyService.new(project.packages_cleanup_policy).execute
    
    # Verify how many installable files the package has
    package.reload.package_files.installable.where(file_name: 'conanmanifest.txt') # should return 2
    
    # Verify how many files were marked for destruction
    package.reload.package_files.pending_destruction # should return 6

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #536184

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports

Loading