Skip to content

Migration to fix duplicate software licenses in license policies table

Bala Kumar requested to merge 395776-migrate-duplicate-software-licenses into master

What does this MR do and why?

We have duplicated software licenses in software_licenses table as identified in issue for the same spdx_identifer with different name values.

Major cause of the duplication was a very old backfill migration whose context is not clear !17004 (diffs)

software_license_policies referencing duplicated software_licenses have to be fixed before the duplicates can be deleted.

The migration in this MR:

  1. Finds duplicated licenses.
  2. Identifies the original license by matching the license name against the official name in https://spdx.org/licenses/licenses.json.
  3. Updates all software_license_policies referencing duplicated licenses to use the original license.
  4. Deletes duplicated licenses.

This MR corrects data in software_license_policies table for records that have a duplicated software_license_id column.

This is the first of two migrations planned to address the issue.

  1. backend database DB migration to update software_license_policies table and replace the duplicate software_license_id with the original license and delete duplicated license. Ignoring duplicated licenses where no original license could be found
  2. backend database Create unique index for software_licenses table on spdx_identifier for not null spdx_identifier in case no duplicates are left.

Database

Queries

See database testing result

Migration

> bundle exec rake db:migrate VERSION=20230608133450
main: == [advisory_lock_connection] object_id: 228180, pg_backend_pid: 69105
main: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrating =
main: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrated (47.4008s)

main: == [advisory_lock_connection] object_id: 228180, pg_backend_pid: 69105
ci: == [advisory_lock_connection] object_id: 232400, pg_backend_pid: 69225
ci: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrating =
ci: -- The migration is skipped since it modifies the schemas: [:gitlab_main].
ci: -- This database can only apply migrations in one of the following schemas: [:gitlab_ci, :gitlab_internal, :gitlab_shared].
ci: == 20230608133450 UpdateDuplicateLicensesInSoftwareLicensePolicies: migrated (0.0056s)

ci: == [advisory_lock_connection] object_id: 232400, pg_backend_pid: 69225

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #395776 (closed)

Edited by Andy Schoenen

Merge request reports