Skip to content

Clean up orphan Software licenses

This is a follow up from https://gitlab.com/gitlab-org/gitlab/-/issues/337329

We store Software licenses in the shared table. If the user adds a new unknown in SPDX license, it stores there forever.

When we address issues with cleaning up newly added licenses, we need to clean up what we have in software_licenses table

Proposed implementation plan

  1. Create a background migration to clean up orphan licenses
  2. Get all licenses in software_licenses that have spdx_identifier: nil and does not belong to any projects.
SoftwareLicense
  .where(spdx_identifier: nil)
  .where.not(
    id: SoftwareLicensePolicy.where(
      software_license_id: SoftwareLicense.where(spdx_identifier: nil)
    )
  .pluck(:software_license_id))
  1. Remove those licenses

Verification steps

  1. Access the postgres.ai as described here
  2. Copy the command
EXPLAIN SELECT * FROM "software_licenses"
WHERE "software_licenses"."spdx_identifier" IS NULL
    AND "software_licenses"."id" NOT IN (
        SELECT
            "software_license_policies"."software_license_id"
        FROM
            "software_license_policies")
  1. Check for the number of rows returned in the Plain with execution section as described here
  2. Verify the actual row count is 0.
Edited by Marcos Rocha