Backfill software license policies without spdx identifier and custom_software_license_id
What does this MR do and why?
We want to remove the software_licenses table as described in #497969 (closed). To remove the software_licenses table, we should first remove the software_license_id column from the software_license_policies table.
Before we can remove the software_license_id column from the software_license_policies table, we need to ensure that all records have either a custom_software_license_id or software_license_spdx_identifier set.
In MR !180958 (merged), we updated our code to find/create and link custom_software_licenses to software_license_policies when the software_license is persisted in the database without the spdx_identifier.
In MR !179744 (merged) we persisted the software_license_spdx_identifier using the information from the ::Gitlab::SPDX::Catalogue when the static_licenses is disabled to prepare our database for the software_licenses table removal.
This MR adds a batched background migration to backfill the software_license_policies without software_license_spdx_identifier and custom_software_license_id.
Database Queries
SELECT
*
FROM
"software_license_policies"
LEFT OUTER JOIN
"software_licenses" ON "software_licenses"."id" = "software_license_policies"."software_license_id"
WHERE
"software_license_policies"."id" IN (
SELECT
"software_license_policies"."id"
FROM
"software_license_policies"
WHERE
"software_license_policies"."id" BETWEEN 36876584 AND 36877584
AND "software_license_policies"."software_license_spdx_identifier" IS NULL
AND "software_license_policies"."custom_software_license_id" IS NULL
AND "software_license_policies"."software_license_id" IS NOT NULL
AND "software_license_policies"."id" >= 36876584
)
ORDER BY
software_license_policies.id ASC
LIMIT 1
OFFSET 100;
https://console.postgres.ai/shared/f3a1a772-73ae-4070-9685-21122a81ee26
Upsert custom_software_license
INSERT INTO "custom_software_licenses" ("name","project_id") VALUES ('New-Custom-License', 1108) ON CONFLICT ("project_id","name") DO NOTHING RETURNING "id"
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/37267/commands/114068
Search custom_software_license
SELECT
custom_software_licenses.*
FROM
custom_software_licenses
WHERE
custom_software_licenses.name = 'Custom-License' AND
custom_software_licenses.project_id = 64365885
LIMIT 1;
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/37012/commands/113678
Update custom_software_license_id
UPDATE software_license_policies
SET
updated_at = '2025-03-06 21:56:36.528641',
custom_software_license_id = 1000002
WHERE
software_license_policies.id = 28186197;
https://postgres.ai/console/gitlab/gitlab-production-main/sessions/36929/commands/113328
Update software_license_spdx_identifier
UPDATE software_license_policies
SET
updated_at = '2025-03-06 21:56:36.542819',
software_license_spdx_identifier = 'MIT'
WHERE
software_license_policies.id = 40616697;
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/37012/commands/113682
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
- Related to Remove software_licenses table
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
I believe the easiest way to test the background migration locally is to create some database records for the following scenarios.
-
SoftwareLicensePolicylinked toSoftwareLicensethat is part of the SPDX catalog:
project = Project.create!(name: 'Test Project', namespace: Namespace.first, path: 'test-project', organization_id: 1, creator: User.first)
mit_license = SoftwareLicense.find_by_spdx_identifier('MIT')
policy_with_license_in_the_catalog = SoftwareLicensePolicy.create!(project: project, software_license: mit_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
-
SoftwareLicensePolicylinked toSoftwareLicensethat is not part of the SPDX catalog.
2.1 CustomSoftwareLicense does not exist for the given name
non_spdx_license = SoftwareLicense.create!(name: 'Non-SPDX License')
policy_without_custom_license = SoftwareLicensePolicy.create!(project: project, software_license: non_spdx_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
2.2 CustomSoftwareLicense exist for the given name
2.2.1 CustomSoftwareLicense exists for other project
other_project = Project.create!(name: 'Other Project', namespace: Namespace.first, path: 'other-project', organization_id: 1, creator: User.first)
custom_license = Security::CustomSoftwareLicense.create!(name: 'Custom License In Other Project', project: other_project)
custom_named_license = SoftwareLicense.create!(name: 'Custom License In Other Project')
policy_custom_license_other_project = SoftwareLicensePolicy.create!(project: project, software_license: custom_named_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
2.2.2 CustomSoftwareLicense exists for the same project
same_project_custom_license = Security::CustomSoftwareLicense.create!(name: 'Same Project Custom License', project: project)
same_project_license = SoftwareLicense.create!(name: 'Same Project Custom License')
policy_existing_custom_license = SoftwareLicensePolicy.create!(project: project, software_license: same_project_license, software_license_spdx_identifier: nil, custom_software_license_id: nil)
- Run the background migration
min, max = SoftwareLicensePolicy.pick('MIN(id)', 'MAX(id)')
Gitlab::BackgroundMigration::BackfillSoftwareLicensePolicies.new(
start_id: min,
end_id: max,
batch_table: :software_license_policies,
batch_column: :id,
sub_batch_size: 100,
pause_ms: 2.minutes,
connection: ApplicationRecord.connection
).perform
- Verify the updates. The migration should
- update the
policy_with_license_in_the_catalogsoftware_license_spdx_identifiertoMIT
policy_with_license_in_the_catalog.reload
SoftwareLicensePolicy Load (1.5ms) SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38071 LIMIT 1
=> #<SoftwareLicensePolicy:0x0000000311d59738
id: 38071,
project_id: 1222,
software_license_id: 221,
classification: "denied",
created_at: Fri, 07 Mar 2025 20:23:22.170571000 UTC +00:00,
updated_at: Fri, 07 Mar 2025 20:27:52.492037000 UTC +00:00,
scan_result_policy_id: nil,
custom_software_license_id: nil,
approval_policy_rule_id: nil,
software_license_spdx_identifier: "MIT">
- create a new
custom_software_licensewith the nameNon-SPDX License
Security::CustomSoftwareLicense.find_by_name('Non-SPDX License')
Security::CustomSoftwareLicense Load (2.4ms) SELECT "custom_software_licenses".* FROM "custom_software_licenses" WHERE "custom_software_licenses"."name" = 'Non-SPDX License' LIMIT 1
=> #<Security::CustomSoftwareLicense:0x000000016fbbc478 id: 25114, project_id: 1222, name: "Non-SPDX License">
- link the new
custom_software_licensetopolicy_without_custom_license
policy_without_custom_license.reload
SoftwareLicensePolicy Load (0.4ms) SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38072 LIMIT 1
=> #<SoftwareLicensePolicy:0x000000016e137d00
id: 38072,
project_id: 1222,
software_license_id: 25701,
classification: "denied",
created_at: Fri, 07 Mar 2025 20:23:47.573651000 UTC +00:00,
updated_at: Fri, 07 Mar 2025 20:27:52.588454000 UTC +00:00,
scan_result_policy_id: nil,
custom_software_license_id: 25114,
approval_policy_rule_id: nil,
software_license_spdx_identifier: nil>
- create a new
custom_software_licensewith the nameCustom License In Other Project
Security::CustomSoftwareLicense.where(name: 'Custom License In Other Project', project: project)
=> [#<Security::CustomSoftwareLicense:0x000000030eeb2a88 id: 25115, project_id: 1222, name: "Custom License In Other Project">]
- link the new
custom_software_licensetopolicy_custom_license_other_project
policy_custom_license_other_project.reload
SoftwareLicensePolicy Load (0.4ms) SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38073 LIMIT 1
=> #<SoftwareLicensePolicy:0x0000000306d92458
id: 38073,
project_id: 1222,
software_license_id: 25702,
classification: "denied",
created_at: Fri, 07 Mar 2025 20:25:53.607536000 UTC +00:00,
updated_at: Fri, 07 Mar 2025 20:27:52.612317000 UTC +00:00,
scan_result_policy_id: nil,
custom_software_license_id: 25115,
approval_policy_rule_id: nil,
software_license_spdx_identifier: nil>
- link the existing
custom_software_licensewith nameSame Project Custom Licensetopolicy_existing_custom_license
Security::CustomSoftwareLicense.find_by_name('Same Project Custom License')
Security::CustomSoftwareLicense Load (1.3ms) SELECT "custom_software_licenses".* FROM "custom_software_licenses" WHERE "custom_software_licenses"."name" = 'Same Project Custom License' LIMIT 1
=> #<Security::CustomSoftwareLicense:0x0000000310fb7c60 id: 25113, project_id: 1222, name: "Same Project Custom License">
policy_existing_custom_license.reload
SoftwareLicensePolicy Load (1.2ms) SELECT "software_license_policies".* FROM "software_license_policies" WHERE "software_license_policies"."id" = 38074 LIMIT 1
=> #<SoftwareLicensePolicy:0x000000016e13b680
id: 38074,
project_id: 1222,
software_license_id: 25703,
classification: "denied",
created_at: Fri, 07 Mar 2025 20:26:23.389720000 UTC +00:00,
updated_at: Fri, 07 Mar 2025 20:27:52.623667000 UTC +00:00,
scan_result_policy_id: nil,
custom_software_license_id: 25113,
approval_policy_rule_id: nil,
software_license_spdx_identifier: nil>