Resolve labels with duplicate title and project IDs
What does this MR do?
Step 2 of #30390 (comment 254445189)
This MR has multiple migrations for resolving duplicate labels that were introduced.
Strategy has to consider self-hosted customers, as well as anyone on gitlab.com.
Migration outputs
$ rake db:migrate:up VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: migrating =============================
-- execute("CREATE TABLE backup_labels (LIKE labels INCLUDING ALL);")
-> 0.0222s
-- execute("ALTER TABLE backup_labels ALTER COLUMN ID DROP DEFAULT;")
-> 0.0004s
-- execute("ALTER TABLE backup_labels ADD COLUMN restore_action INTEGER;")
-> 0.0006s
-- execute("ALTER TABLE backup_labels ADD COLUMN new_title VARCHAR;")
-> 0.0003s
== 20200305020458 AddLabelRestoreTable: migrated (0.0238s) ====================
$ rake db:migrate:down VERSION=20200305020458
== 20200305020458 AddLabelRestoreTable: reverting =============================
-- drop_table(:backup_labels)
-> 0.0171s
== 20200305020458 AddLabelRestoreTable: reverted (0.0171s) ====================
$ rake db:migrate:up VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: migrating =======================
-- transaction_open?()
-> 0.0000s
-- foreign_keys(:backup_labels)
-> 0.0015s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_7de4989a69\nFOREIGN KEY (project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
-> 0.0053s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_7de4989a69;")
-> 0.0063s
-- transaction_open?()
-> 0.0000s
-- foreign_keys(:backup_labels)
-> 0.0021s
-- execute("ALTER TABLE backup_labels\nADD CONSTRAINT fk_rails_c1ac5161d8\nFOREIGN KEY (group_id)\nREFERENCES namespaces (id)\nON DELETE CASCADE\nNOT VALID;\n")
-> 0.0021s
-- execute("ALTER TABLE backup_labels VALIDATE CONSTRAINT fk_rails_c1ac5161d8;")
-> 0.0091s
== 20200305020459 AddLabelRestoreForeignKeys: migrated (0.0413s) ==============
$ rake db:migrate:down VERSION=20200305020459
== 20200305020459 AddLabelRestoreForeignKeys: reverting =======================
-- remove_foreign_key(:backup_labels, {:name=>"fk_7de4989a69"})
-> 0.0041s
-- remove_foreign_key(:backup_labels, {:name=>"fk_rails_c1ac5161d8"})
-> 0.0039s
== 20200305020459 AddLabelRestoreForeignKeys: reverted (0.0221s) ==============
$ rake db:migrate:up VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: migrating =================
== 20200305082754 RemoveDuplicateLabelsFromProject: migrated (0.0951s) ========
$ rake db:migrate:down VERSION=20200305082754
== 20200305082754 RemoveDuplicateLabelsFromProject: reverting =================
== 20200305082754 RemoveDuplicateLabelsFromProject: reverted (0.0056s) ========
$ be rails db:migrate:up VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrating =========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0057s
-- add_index(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>true, :name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0090s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
-> 0.0043s
-- remove_index(:labels, {:name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently, :column=>[:project_id, :title]})
-> 0.0099s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: migrated (0.0301s)
$ be rails db:migrate:down VERSION=20200305082858
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverting =========
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:where=>"labels.group_id IS NULL", :unique=>false, :name=>"index_labels_on_project_id_and_title", :algorithm=>:concurrently})
-> 0.0048s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:labels, [:project_id, :title], {:name=>"index_labels_on_project_id_and_title_unique", :algorithm=>:concurrently})
-> 0.0026s
== 20200305082858 AddUniquenessIndexToLabelTitleAndProject: reverted (0.0077s)
As of 7/7/2020 using queries in !21384 (comment 354329489):
Full duplicates: 305 full duplicates
CTE Scan on data (cost=6876482.76..7112371.68 rows=3494651 width=1169) (actual time=593254.903..618840.515 rows=305 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 9223143
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
CTE data
-> WindowAgg (cost=6535754.32..6876482.76 rows=10483952 width=93) (actual time=592912.920..609623.229 rows=9223448 loops=1)
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Sort (cost=6535754.32..6561964.20 rows=10483952 width=85) (actual time=592912.848..597381.059 rows=9223448 loops=1)
Sort Key: labels.project_id, labels.title, labels.template, labels.description, labels.type, labels.color, labels.id
Sort Method: external merge Disk: 730536kB
Buffers: shared hit=30963370 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Merge Anti Join (cost=2.57..4345719.49 rows=10483952 width=85) (actual time=171.499..580262.662 rows=9223448 loops=1)
Merge Cond: (labels.id = label_links.label_id)
Buffers: shared hit=30963362 read=632557 dirtied=19
I/O Timings: read=485567.114
-> Merge Anti Join (cost=2.01..2455553.43 rows=10660104 width=85) (actual time=0.420..344855.708 rows=9949673 loops=1)
Merge Cond: (labels.id = board_labels.label_id)
Buffers: shared hit=21898713 read=419284 dirtied=19
I/O Timings: read=284139.828
-> Merge Anti Join (cost=1.72..2428352.58 rows=10670229 width=85) (actual time=0.395..341396.929 rows=9950982 loops=1)
Merge Cond: (labels.id = resource_label_events.label_id)
Buffers: shared hit=21896176 read=419224 dirtied=19
I/O Timings: read=284124.006
-> Merge Anti Join (cost=1.16..864817.36 rows=10862671 width=85) (actual time=0.350..77571.231 rows=10697595 loops=1)
Merge Cond: (labels.id = label_priorities.label_id)
Buffers: shared hit=4190824 read=134114 dirtied=1
I/O Timings: read=54769.635
-> Merge Anti Join (cost=0.87..833679.69 rows=10957388 width=85) (actual time=0.317..73649.101 rows=10763349 loops=1)
Merge Cond: (labels.id = lists.label_id)
Buffers: shared hit=4184162 read=133686 dirtied=1
I/O Timings: read=54673.139
-> Index Scan using labels_pkey on public.labels (cost=0.43..763477.04 rows=11479303 width=85) (actual time=0.062..65234.210 rows=11566279 loops=1)
Filter: (labels.project_id IS NOT NULL)
Rows Removed by Filter: 264215
Buffers: shared hit=4128678 read=128411
I/O Timings: read=52000.066
-> Index Only Scan using index_lists_on_label_id on public.lists (cost=0.43..89979.39 rows=4591197 width=4) (actual time=0.050..3173.257 rows=1139508 loops=1)
Heap Fetches: 19501
Buffers: shared hit=55484 read=5275 dirtied=1
I/O Timings: read=2673.073
-> Index Only Scan using index_label_priorities_on_label_id on public.label_priorities (cost=0.29..2432.13 rows=111437 width=4) (actual time=0.031..187.849 rows=111436 loops=1)
Heap Fetches: 11500
Buffers: shared hit=6662 read=428
I/O Timings: read=96.496
-> Index Only Scan using index_resource_label_events_on_label_id_and_action on public.resource_label_events (cost=0.56..1050365.46 rows=49314016 width=4) (actual time=0.043..254400.561 rows=48269580 loops=1)
Heap Fetches: 812370
Buffers: shared hit=17705352 read=285110 dirtied=18
I/O Timings: read=229354.371
-> Index Only Scan using index_board_labels_on_label_id on public.board_labels (cost=0.29..367.74 rows=13837 width=4) (actual time=0.023..39.657 rows=13837 loops=1)
Heap Fetches: 4220
Buffers: shared hit=2537 read=60
I/O Timings: read=15.822
-> Index Only Scan using index_label_links_on_label_id on public.label_links (cost=0.56..1236166.56 rows=54520900 width=4) (actual time=0.030..225333.163 rows=54459126 loops=1)
Heap Fetches: 5092432
Buffers: shared hit=9064649 read=213273
I/O Timings: read=201427.286
partial duplicates: 518
CTE Scan on data (cost=3238456.18..3496740.50 rows=3826434 width=1169) (actual time=122774.202..151841.224 rows=518 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 11565761
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
CTE data
-> WindowAgg (cost=2980171.87..3238456.18 rows=11479303 width=93) (actual time=122630.119..141305.898 rows=11566279 loops=1)
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
-> Sort (cost=2980171.87..3008870.12 rows=11479303 width=85) (actual time=122630.064..127836.463 rows=11566279 loops=1)
Sort Key: labels.project_id, labels.title, labels.id
Sort Method: external merge Disk: 914000kB
Buffers: shared hit=4402391 read=132935 dirtied=56
I/O Timings: read=96180.352
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..574706.28 rows=11479303 width=85) (actual time=44.087..110733.988 rows=11566279 loops=1)
Index Cond: (labels.project_id IS NOT NULL)
Buffers: shared hit=4402385 read=132935 dirtied=56
I/O Timings: read=96180.352
Related to #30390 (closed)
Merge request reports
Activity
changed milestone to %12.6
assigned to @cablett
- Resolved by charlie ablett
Seeing as we have to assume that we have a bunch of duplicated labels that have the same
project_id
andtitle
, here's a first strategy:Stage 1: 'remove accidental duplicates':
- Get all the duplicates (including of colour, description, type, etc). For example, ~title (in yellow) and ~title (in green) are not duplicates.
- Destroy the ones that are safe to remove - they have no relationship to anything else (if none have any relationships, leave at least one)
Stage 2: 'Make all label titles unique across projects':
- Get all duplicates in groups by
project_id
andtitle
- For each group, modify the titles so they're no longer unique (so rename them from
Title
,Title
andTitle
toTitle
,Title-1
andTitle-2
) or whatever as long as that name doesn't already exist (e.g. we'd check forTitle-2
before renaming) - apply a uniqueness index in a different migration straight after
It would require a semaphore/write lock since we want to ensure someone can't make a duplicate label while it's running steps 1-2 above
Pros:
- It is not a behaviourally complex change - it doesn't involve considering every possible permutation
- It resolves without massive changes to other tables
- It's a very visible change that is easy to users to notice and they can easily 'fix' (ie. edit the label names, or delete)
- It doesn't affect any relationships
- It leaves it to the user to ultimately decide what's a duplicate
Cons:
- It feels a bit hamfisted, is going to involve a lot of queries/index scans and is technically kind of complex (only cos SQL)
- Users are going to wonder why a bunch of their labels suddenly have numbers in them
- There might be some use cases for labels that we haven't considered that may be thrown off by this change, like label API hooks? subscriptions? etc.
- It requires work from users to resolve, some of them might have a lot of labels to go through
Edited by charlie ablett
changed milestone to %12.7
added missed:12.6 label
added workflowin dev label and removed workflowready for development label
added 2730 commits
-
15616dbf...728c47d8 - 2730 commits from branch
master
-
15616dbf...728c47d8 - 2730 commits from branch
added 167 commits
-
035fd458...4f642915 - 167 commits from branch
master
-
035fd458...4f642915 - 167 commits from branch
added 142 commits
-
4f642915...ec7f69c4 - 142 commits from branch
master
-
4f642915...ec7f69c4 - 142 commits from branch
added 211 commits
-
ec7f69c4...c820e576 - 211 commits from branch
master
-
ec7f69c4...c820e576 - 211 commits from branch
added 234 commits
-
c820e576...92dbe3b7 - 234 commits from branch
master
-
c820e576...92dbe3b7 - 234 commits from branch
added 237 commits
-
92dbe3b7...6eed143b - 237 commits from branch
master
-
92dbe3b7...6eed143b - 237 commits from branch
removed Deliverable label
changed milestone to %12.8
added missed:12.7 label
added 281 commits
-
6eed143b...dd3380ab - 281 commits from branch
master
-
6eed143b...dd3380ab - 281 commits from branch
added 135 commits
-
17fc902d...67abf2b7 - 135 commits from branch
master
-
17fc902d...67abf2b7 - 135 commits from branch
added 194 commits
-
67abf2b7...2f6cfdd5 - 194 commits from branch
master
-
67abf2b7...2f6cfdd5 - 194 commits from branch
added 138 commits
-
2f6cfdd5...7596cb21 - 138 commits from branch
master
-
2f6cfdd5...7596cb21 - 138 commits from branch
added 126 commits
-
7596cb21...d1af200a - 126 commits from branch
master
-
7596cb21...d1af200a - 126 commits from branch
added 151 commits
-
d1af200a...8240d01a - 151 commits from branch
master
-
d1af200a...8240d01a - 151 commits from branch
added 186 commits
-
8240d01a...a15388de - 186 commits from branch
master
-
8240d01a...a15388de - 186 commits from branch
added 228 commits
-
a15388de...cee39799 - 228 commits from branch
master
-
a15388de...cee39799 - 228 commits from branch
added 311 commits
-
cee39799...2e6667dd - 311 commits from branch
master
-
cee39799...2e6667dd - 311 commits from branch
added 204 commits
-
2e6667dd...fc609e6c - 204 commits from branch
master
-
2e6667dd...fc609e6c - 204 commits from branch
added 196 commits
-
fc609e6c...a11fca06 - 196 commits from branch
master
-
fc609e6c...a11fca06 - 196 commits from branch
added 253 commits
-
a11fca06...9da4a6d5 - 253 commits from branch
master
-
a11fca06...9da4a6d5 - 253 commits from branch
added 211 commits
-
9da4a6d5...8b39d9ed - 211 commits from branch
master
-
9da4a6d5...8b39d9ed - 211 commits from branch
added 210 commits
-
8b39d9ed...f153f3a7 - 210 commits from branch
master
-
8b39d9ed...f153f3a7 - 210 commits from branch
added 121 commits
-
f153f3a7...ec92e269 - 121 commits from branch
master
-
f153f3a7...ec92e269 - 121 commits from branch
added 144 commits
-
ec92e269...b6ae072f - 144 commits from branch
master
-
ec92e269...b6ae072f - 144 commits from branch
added 230 commits
-
b6ae072f...389f8c96 - 230 commits from branch
master
-
b6ae072f...389f8c96 - 230 commits from branch
added 257 commits
-
389f8c96...5cd90761 - 257 commits from branch
master
-
389f8c96...5cd90761 - 257 commits from branch
added 208 commits
-
5cd90761...a343011d - 208 commits from branch
master
-
5cd90761...a343011d - 208 commits from branch
added 176 commits
-
a343011d...c299e39f - 176 commits from branch
master
-
a343011d...c299e39f - 176 commits from branch
added 130 commits
-
447affba...12ac49bf - 130 commits from branch
master
-
447affba...12ac49bf - 130 commits from branch
changed milestone to %12.9
added missed:12.8 label
added 168 commits
-
12ac49bf...234355de - 168 commits from branch
master
-
12ac49bf...234355de - 168 commits from branch