Background migration: merge topics with same name
What does this MR do and why?
This MR is the third step of #351871 (closed) and merges all topics with the same case insensitive name using a background database migration:
- The topic with the largest number of assigned projects is kept.
- The other topics with the same case insensitive name are deleted and the project assignments are transferred to the kept topic.
- If the kept topic has no avatar or description, the avatar and description is taken from the deleted topic with the second most assigned projects, then from the third most, ...
This approach was discussed in #351871 (comment 830292330).
Example:
Before | After |
---|---|
Topics: - javascript (2710 projects)- JavaScript (967 projects, avatar: 'js.png')- Javascript (467 projects, description: 'foo...')- JAVASCRIPT (11 projects)- javaScript (6 projects) |
Topics: - javascript (4161 projects, avatar: 'js.png', description: 'foo...')- other topics with same case insensitive name are removed |
/cc @bufferoverflow
Database Migration
data-deletion)
Irreversible migration (The background data migration introduced by this MR is designed to delete some topic
records if they are duplicates. If an avatar file exists, it will also be deleted (application code). The associated project_topic
records are transferred to the kept topic (with the same name) or deleted if an identical project_topic
record already exists for the kept topic.
There is no way to bring back the deleted records as they are duplicates anyway.
data-deletion)
User experience impact (During this migration, duplicated topics are deleted. If the migration fails, topics with the same name may still exist. This would be no difference to the current situation.
data-deletion)
Update/deletion query plans (see !81503 (comment 892088813)
Query Plans (outdated)
ScheduleMergeTopicsWithSameName
Group topics by case insensitive name: Projects::Topic.select('LOWER(name) as name').group('LOWER(name)').having('COUNT(*) > 1')
SQL query
SELECT
LOWER(name) AS name
FROM
"topics"
GROUP BY
LOWER(name)
HAVING (COUNT(*) > 1);
Gitlab::BackgroundMigration::MergeTopicsWithSameName
Check if topics exist: topics.offset(1).any?
SQL query
SELECT
1 AS one
FROM
"topics"
WHERE (LOWER(name) = 'topic1')
LIMIT 1 OFFSET 1;
Get 'topic to keep': topics.first
SQL query
SELECT
"topics".*
FROM
"topics"
WHERE (LOWER(name) = 'topic1')
ORDER BY
"topics"."total_projects_count" DESC,
"topics"."non_private_projects_count" DESC,
"topics"."id" ASC
LIMIT 1;
Get 'topics to remove': topics.offset(1)
SQL query
SELECT
"topics".*
FROM
"topics"
WHERE (LOWER(name) = 'topic1')
ORDER BY
"topics"."total_projects_count" DESC,
"topics"."non_private_projects_count" DESC,
"topics"."id" ASC
OFFSET 1;
Change project topic assignments from 'topic to remove' to 'topic to keep': ProjectTopic.[...].update_all(...)
SQL query
UPDATE
"project_topics"
SET
"topic_id" = 97
WHERE
"project_topics"."topic_id" = 98
AND "project_topics"."project_id" NOT IN (
SELECT
"project_topics"."project_id"
FROM
"project_topics"
WHERE
"project_topics"."topic_id" = 97);
Remove project topic assignments of 'topic to remove' that already exist for 'topic to keep': ProjectTopic.where(topic_id: topic.id).delete_all
SQL query
DELETE FROM "project_topics"
WHERE "project_topics"."topic_id" = 98;
Remove 'topics to delete': topics_to_remove.delete_all
SQL query
DELETE FROM "topics"
WHERE "topics"."id" IN (
SELECT
"topics"."id"
FROM
"topics"
WHERE (LOWER(name) = 'topic1')
ORDER BY
"topics"."total_projects_count" DESC,
"topics"."non_private_projects_count" DESC,
"topics"."id" ASC
OFFSET 1);
Get total projects count: total_projects_count(...)
SQL query
SELECT
COUNT(*)
FROM
"project_topics"
WHERE
"project_topics"."topic_id" = 97;
Get non-internal projects count: non_private_projects_count(...)
SQL query
SELECT
COUNT(*)
FROM
"project_topics"
INNER JOIN projects ON project_topics.project_id = projects.id
WHERE
"project_topics"."topic_id" = 97
AND (projects.visibility_level > 0);
Update 'topic to keep': topic_to_keep.update(...)
SQL query
UPDATE
"topics"
SET
"updated_at" = '2022-03-08 09:14:45.882098',
"total_projects_count" = 3
WHERE
"topics"."id" = 97;
How to set up and validate locally
- Add topics with the same case insensitive name (you need to bypass the model validation).
- Assign these topics to different projects (project settings).
- Run the background migration.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.