Backfill software license policies without spdx identifier and custom_software_license_id

What does this MR do and why?

We want to remove the software_licenses table as described in #497969 (closed). To remove the software_licenses table, we should first remove the software_license_id column from the software_license_policies table.

Before we can remove the software_license_id column from the software_license_policies table, we need to ensure that all records have either a custom_software_license_id or software_license_spdx_identifier set.

In MR !180958 (merged), we updated our code to find/create and link custom_software_licenses to software_license_policies when the software_license is persisted in the database without the spdx_identifier.

In MR !179744 (merged) we persisted the software_license_spdx_identifier using the information from the ::Gitlab::SPDX::Catalogue when the static_licenses is disabled to prepare our database for the software_licenses table removal.

This MR adds a batched background migration to backfill the software_license_policies without software_license_spdx_identifier and custom_software_license_id.

Database Queries

SELECT 
  *
FROM 
  "software_license_policies"
LEFT OUTER JOIN 
  "software_licenses" ON "software_licenses"."id" = "software_license_policies"."software_license_id"
WHERE 
  "software_license_policies"."id" IN (
    SELECT 
      "software_license_policies"."id"
    FROM 
      "software_license_policies"
    WHERE 
      "software_license_policies"."id" BETWEEN 36876584 AND 36877584
      AND "software_license_policies"."software_license_spdx_identifier" IS NULL
      AND "software_license_policies"."custom_software_license_id" IS NULL
      AND "software_license_policies"."software_license_id" IS NOT NULL
      AND "software_license_policies"."id" >= 36876584
  )
ORDER BY
    software_license_policies.id ASC
LIMIT 1
OFFSET 100;

https://console.postgres.ai/shared/f3a1a772-73ae-4070-9685-21122a81ee26

Upsert custom_software_license

INSERT INTO "custom_software_licenses" ("name","project_id") VALUES ('New-Custom-License', 1108) ON CONFLICT ("project_id","name") DO NOTHING RETURNING "id"

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/37267/commands/114068

Search custom_software_license

SELECT
    custom_software_licenses.*
FROM
    custom_software_licenses
WHERE
    custom_software_licenses.name = 'Custom-License' AND
    custom_software_licenses.project_id = 64365885
LIMIT 1;

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/37012/commands/113678

Update custom_software_license_id

UPDATE software_license_policies
SET
    updated_at = '2025-03-06 21:56:36.528641',
    custom_software_license_id = 1000002
WHERE
    software_license_policies.id = 28186197;

https://postgres.ai/console/gitlab/gitlab-production-main/sessions/36929/commands/113328

Update software_license_spdx_identifier

UPDATE software_license_policies
SET
    updated_at = '2025-03-06 21:56:36.542819',
    software_license_spdx_identifier = 'MIT'
WHERE
    software_license_policies.id = 40616697;

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37012/commands/113682

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

I believe the easiest way to test the background migration locally is to create some database records for the following scenarios.

  1. SoftwareLicensePolicy linked to SoftwareLicense that is part of the SPDX catalog:
project = Project.create!(name: 'Test Project', namespace: Namespace.first, path: 'test-project', organization_id: 1, creator: User.first)
mit_license = SoftwareLicense.find_by_spdx_identifier('MIT')
policy_with_license_in_the_catalog = SoftwareLicensePolicy.create!(project: project, software_license: mit_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
  1. SoftwareLicensePolicy linked to SoftwareLicense that is not part of the SPDX catalog.

2.1 CustomSoftwareLicense does not exist for the given name

non_spdx_license = SoftwareLicense.create!(name: 'Non-SPDX License')
policy_without_custom_license = SoftwareLicensePolicy.create!(project: project, software_license: non_spdx_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)

2.2 CustomSoftwareLicense exist for the given name

2.2.1 CustomSoftwareLicense exists for other project

other_project = Project.create!(name: 'Other Project', namespace: Namespace.first, path: 'other-project', organization_id: 1, creator: User.first)
custom_license = Security::CustomSoftwareLicense.create!(name: 'Custom License In Other Project', project: other_project)
custom_named_license = SoftwareLicense.create!(name: 'Custom License In Other Project')
policy_custom_license_other_project = SoftwareLicensePolicy.create!(project: project, software_license: custom_named_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)

2.2.2 CustomSoftwareLicense exists for the same project

same_project_custom_license = Security::CustomSoftwareLicense.create!(name: 'Same Project Custom License', project: project)
same_project_license = SoftwareLicense.create!(name: 'Same Project Custom License')
policy_existing_custom_license = SoftwareLicensePolicy.create!(project: project, software_license: same_project_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
  1. Run the background migration
min, max = SoftwareLicensePolicy.pick('MIN(id)', 'MAX(id)')
Gitlab::BackgroundMigration::BackfillSoftwareLicensePolicies.new(
  start_id: min,
  end_id: max,
  batch_table: :software_license_policies,
  batch_column: :id,
  sub_batch_size: 100,
  pause_ms: 2.minutes,
  connection: ApplicationRecord.connection
).perform
  1. Verify the updates. The migration should
  • update the policy_with_license_in_the_catalog software_license_spdx_identifier to MIT
policy_with_license_in_the_catalog.reload

 SoftwareLicensePolicy Load (1.5ms)  SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38071 LIMIT 1 
=> #<SoftwareLicensePolicy:0x0000000311d59738
 id: 38071,
 project_id: 1222,
 software_license_id: 221,
 classification: "denied",
 created_at: Fri, 07 Mar 2025 20:23:22.170571000 UTC +00:00,
 updated_at: Fri, 07 Mar 2025 20:27:52.492037000 UTC +00:00,
 scan_result_policy_id: nil,
 custom_software_license_id: nil,
 approval_policy_rule_id: nil,
 software_license_spdx_identifier: "MIT">
  • create a new custom_software_license with the name Non-SPDX License
Security::CustomSoftwareLicense.find_by_name('Non-SPDX License')
Security::CustomSoftwareLicense Load (2.4ms)  SELECT "custom_software_licenses".* FROM "custom_software_licenses" WHERE "custom_software_licenses"."name" = 'Non-SPDX License' LIMIT 1
=> #<Security::CustomSoftwareLicense:0x000000016fbbc478 id: 25114, project_id: 1222, name: "Non-SPDX License">
  • link the new custom_software_license to policy_without_custom_license
policy_without_custom_license.reload
 SoftwareLicensePolicy Load (0.4ms)  SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38072 LIMIT 1 
=> #<SoftwareLicensePolicy:0x000000016e137d00
 id: 38072,
 project_id: 1222,
 software_license_id: 25701,
 classification: "denied",
 created_at: Fri, 07 Mar 2025 20:23:47.573651000 UTC +00:00,
 updated_at: Fri, 07 Mar 2025 20:27:52.588454000 UTC +00:00,
 scan_result_policy_id: nil,
 custom_software_license_id: 25114,
 approval_policy_rule_id: nil,
 software_license_spdx_identifier: nil>
  • create a new custom_software_license with the name Custom License In Other Project
Security::CustomSoftwareLicense.where(name: 'Custom License In Other Project', project: project)
=> [#<Security::CustomSoftwareLicense:0x000000030eeb2a88 id: 25115, project_id: 1222, name: "Custom License In Other Project">]
  • link the new custom_software_license to policy_custom_license_other_project
policy_custom_license_other_project.reload
 SoftwareLicensePolicy Load (0.4ms)  SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38073 LIMIT 1 
=> #<SoftwareLicensePolicy:0x0000000306d92458
 id: 38073,
 project_id: 1222,
 software_license_id: 25702,
 classification: "denied",
 created_at: Fri, 07 Mar 2025 20:25:53.607536000 UTC +00:00,
 updated_at: Fri, 07 Mar 2025 20:27:52.612317000 UTC +00:00,
 scan_result_policy_id: nil,
 custom_software_license_id: 25115,
 approval_policy_rule_id: nil,
 software_license_spdx_identifier: nil>
  • link the existing custom_software_license with name Same Project Custom License to policy_existing_custom_license
Security::CustomSoftwareLicense.find_by_name('Same Project Custom License')
Security::CustomSoftwareLicense Load (1.3ms)  SELECT "custom_software_licenses".* FROM "custom_software_licenses" WHERE "custom_software_licenses"."name" = 'Same Project Custom License' LIMIT 1 
=> #<Security::CustomSoftwareLicense:0x0000000310fb7c60 id: 25113, project_id: 1222, name: "Same Project Custom License">
policy_existing_custom_license.reload
 SoftwareLicensePolicy Load (1.2ms)  SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38074 LIMIT 1 
=> #<SoftwareLicensePolicy:0x000000016e13b680
 id: 38074,
 project_id: 1222,
 software_license_id: 25703,
 classification: "denied",
 created_at: Fri, 07 Mar 2025 20:26:23.389720000 UTC +00:00,
 updated_at: Fri, 07 Mar 2025 20:27:52.623667000 UTC +00:00,
 scan_result_policy_id: nil,
 custom_software_license_id: 25113,
 approval_policy_rule_id: nil,
 software_license_spdx_identifier: nil>
Edited by Marcos Rocha

Merge request reports

Loading