Deduplicate Group labels with identical title and group_id
Related to #30390 (closed), but for group labels
There was a race condition (and a lack of uniqueness constraint on the db level) that meant that users could accidentally create labels with the same title within the same group. Project labels were deduplicated already, and now we need to deduplicate group labels.
As of 18 November 2020:
Query string:
explain WITH data AS ( SELECT labels.*, row_number() OVER (PARTITION BY labels.group_id, labels.title, labels.template, labels.description, labels.type, labels.color ORDER BY labels.id) AS row_number FROM labels WHERE NOT EXISTS (SELECT * FROM board_labels WHERE board_labels.label_id = labels.id) AND NOT EXISTS (SELECT * FROM label_links WHERE label_links.label_id = labels.id) AND NOT EXISTS (SELECT * FROM label_priorities WHERE label_priorities.label_id = labels.id) AND NOT EXISTS (SELECT * FROM lists WHERE lists.label_id = labels.id) AND NOT EXISTS (SELECT * FROM resource_label_events WHERE resource_label_events.label_id = labels.id) AND group_id IS NOT NULL) SELECT * FROM data WHERE row_number > 1;
20 rows (full duplicates)
CTE Scan on data (cost=494855.82..501974.06 rows=105455 width=1169) (actual time=28983.365..29113.192 rows=20 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 65548
Buffers: shared hit=1846930 read=195646 dirtied=5801
I/O Timings: read=73935.121
CTE data
-> WindowAgg (cost=484573.93..494855.82 rows=316366 width=94) (actual time=28957.936..29064.151 rows=65568 loops=1)
Buffers: shared hit=1846930 read=195646 dirtied=5801
I/O Timings: read=73935.121
-> Sort (cost=484573.93..485364.84 rows=316366 width=86) (actual time=28957.852..28969.263 rows=65568 loops=1)
Sort Key: labels.group_id, labels.title, labels.template, labels.description, labels.type, labels.color, labels.id
Sort Method: quicksort Memory: 14563kB
Buffers: shared hit=1846930 read=195646 dirtied=5801
I/O Timings: read=73935.121
-> Gather (cost=230118.98..455671.94 rows=316366 width=86) (actual time=9948.974..28677.442 rows=65568 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=1846922 read=195646 dirtied=5801
I/O Timings: read=73935.121
-> Nested Loop Anti Join (cost=229118.98..423035.34 rows=131819 width=86) (actual time=9935.678..28675.018 rows=21856 loops=3)
Buffers: shared hit=1846922 read=195646 dirtied=5801
I/O Timings: read=73935.121
-> Nested Loop Anti Join (cost=229118.41..344354.55 rows=133627 width=86) (actual time=9932.965..22251.509 rows=27677 loops=3)
Buffers: shared hit=1531967 read=171372 dirtied=5535
I/O Timings: read=56395.591
-> Merge Anti Join (cost=229117.84..264614.96 rows=135960 width=86) (actual time=9931.305..12237.281 rows=67990 loops=3)
Merge Cond: (labels.id = board_labels.label_id)
Buffers: shared hit=671944 read=109975 dirtied=3575
I/O Timings: read=29958.864
-> Merge Anti Join (cost=229117.56..263772.44 rows=136105 width=86) (actual time=9927.192..12152.813 rows=68772 loops=3)
Merge Cond: (labels.id = label_priorities.label_id)
Buffers: shared hit=660325 read=109871 dirtied=3523
I/O Timings: read=29925.328
-> Merge Anti Join (cost=229117.14..260646.20 rows=137292 width=86) (actual time=9922.265..12038.008 rows=72777 loops=3)
Merge Cond: (labels.id = lists.label_id)
Buffers: shared hit=656080 read=109437 dirtied=3507
I/O Timings: read=29870.174
-> Sort (cost=229116.71..229475.42 rows=143485 width=86) (actual time=9765.087..9807.448 rows=116640 loops=3)
Sort Key: labels.id
Sort Method: quicksort Memory: 24807kB
Buffers: shared hit=212247 read=102077 dirtied=1324
I/O Timings: read=27387.673
-> Parallel Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..216826.83 rows=143485 width=86) (actual time=13.932..9612.716 rows=116640 loops=3)
Index Cond: (labels.group_id IS NOT NULL)
Buffers: shared hit=212235 read=102077 dirtied=1324
I/O Timings: read=27387.673
-> Index Only Scan using index_lists_on_label_id on public.lists (cost=0.43..91734.16 rows=4691082 width=4) (actual time=0.086..1971.498 rows=1405423 loops=3)
Heap Fetches: 344290
Buffers: shared hit=443833 read=7360 dirtied=2183
I/O Timings: read=2482.502
-> Index Only Scan using index_label_priorities_on_label_id on public.label_priorities (cost=0.42..2466.88 rows=121731 width=4) (actual time=0.049..62.893 rows=122321 loops=3)
Heap Fetches: 5709
Buffers: shared hit=4245 read=434 dirtied=16
I/O Timings: read=55.154
-> Index Only Scan using index_board_labels_on_label_id on public.board_labels (cost=0.29..459.22 rows=16816 width=4) (actual time=0.035..46.675 rows=16675 loops=3)
Heap Fetches: 18210
Buffers: shared hit=11619 read=104 dirtied=52
I/O Timings: read=33.536
-> Index Only Scan using index_resource_label_events_on_label_id_and_action on public.resource_label_events (cost=0.56..6.41 rows=281 width=4) (actual time=0.146..0.146 rows=1 loops=203969)
Index Cond: (resource_label_events.label_id = labels.id)
Heap Fetches: 2895
Buffers: shared hit=860023 read=61397 dirtied=1956
I/O Timings: read=26436.728
-> Index Only Scan using index_label_links_on_label_id_and_target_type on public.label_links (cost=0.57..9.78 rows=444 width=4) (actual time=0.231..0.231 rows=0 loops=83030)
Index Cond: (label_links.label_id = labels.id)
Heap Fetches: 892
Buffers: shared hit=314955 read=24274 dirtied=266
I/O Timings: read=17539.530
Query string:
explain WITH data AS ( SELECT *, row_number() OVER (PARTITION BY group_id, title ORDER BY id) AS row_number FROM labels WHERE group_id IS NOT NULL ) SELECT * FROM data WHERE row_number > 1;
642 rows (partial + full duplicates)
CTE Scan on data (cost=258254.11..266002.27 rows=114788 width=1169) (actual time=899.740..1665.839 rows=642 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 349278
Buffers: shared hit=313491 dirtied=1
CTE data
-> WindowAgg (cost=250505.94..258254.11 rows=344363 width=94) (actual time=890.762..1415.805 rows=349920 loops=1)
Buffers: shared hit=313491 dirtied=1
-> Sort (cost=250505.94..251366.85 rows=344363 width=86) (actual time=890.741..931.343 rows=349920 loops=1)
Sort Key: labels.group_id, labels.title, labels.id
Sort Method: quicksort Memory: 74479kB
Buffers: shared hit=313491 dirtied=1
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..218835.61 rows=344363 width=86) (actual time=0.026..471.602 rows=349920 loops=1)
Index Cond: (labels.group_id IS NOT NULL)
Buffers: shared hit=313491 dirtied=1
As of 10 July 2020:
Query string:
explain WITH data AS ( SELECT labels.*, row_number() OVER (PARTITION BY labels.group_id, labels.title, labels.template, labels.description, labels.type, labels.color ORDER BY labels.id) AS row_number FROM labels WHERE NOT EXISTS (SELECT * FROM board_labels WHERE board_labels.label_id = labels.id) AND NOT EXISTS (SELECT * FROM label_links WHERE label_links.label_id = labels.id) AND NOT EXISTS (SELECT * FROM label_priorities WHERE label_priorities.label_id = labels.id) AND NOT EXISTS (SELECT * FROM lists WHERE lists.label_id = labels.id) AND NOT EXISTS (SELECT * FROM resource_label_events WHERE resource_label_events.label_id = labels.id) AND group_id IS NOT NULL) SELECT * FROM data WHERE row_number > 1;
20 rows (full duplicates)
CTE Scan on data (cost=585778.36..591337.37 rows=82356 width=1169) (actual time=132157.814..132247.676 rows=20 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 40875
Buffers: shared hit=1283704 read=90773 dirtied=9342
I/O Timings: read=123628.808
CTE data
-> WindowAgg (cost=577748.69..585778.36 rows=247067 width=93) (actual time=132126.445..132214.885 rows=40895 loops=1)
Buffers: shared hit=1283704 read=90773 dirtied=9342
I/O Timings: read=123628.808
-> Sort (cost=577748.69..578366.35 rows=247067 width=85) (actual time=132126.406..132134.930 rows=40895 loops=1)
Sort Key: labels.group_id, labels.title, labels.template, labels.description, labels.type, labels.color, labels.id
Sort Method: quicksort Memory: 9340kB
Buffers: shared hit=1283704 read=90773 dirtied=9342
I/O Timings: read=123628.808
-> Nested Loop Anti Join (cost=222416.95..555618.22 rows=247067 width=85) (actual time=1276.603..131973.813 rows=40895 loops=1)
Buffers: shared hit=1283696 read=90773 dirtied=9342
I/O Timings: read=123628.808
-> Nested Loop Anti Join (cost=222416.38..408160.81 rows=251160 width=85) (actual time=1271.787..103779.445 rows=58026 loops=1)
Buffers: shared hit=1065193 read=70841 dirtied=9318
I/O Timings: read=96106.625
-> Merge Anti Join (cost=222415.82..258046.26 rows=255638 width=85) (actual time=1268.149..17579.538 rows=155219 loops=1)
Merge Cond: (labels.id = board_labels.label_id)
Buffers: shared hit=409992 read=13149 dirtied=3688
I/O Timings: read=14083.139
-> Merge Anti Join (cost=222415.53..256999.24 rows=255879 width=85) (actual time=1263.868..17365.476 rows=157100 loops=1)
Merge Cond: (labels.id = label_priorities.label_id)
Buffers: shared hit=407018 read=13067 dirtied=3662
I/O Timings: read=14025.605
-> Merge Anti Join (cost=222415.24..253309.09 rows=258092 width=85) (actual time=1213.450..16798.125 rows=167972 loops=1)
Merge Cond: (labels.id = lists.label_id)
Buffers: shared hit=394614 read=12538 dirtied=3556
I/O Timings: read=13718.747
-> Sort (cost=222414.81..223086.56 rows=268701 width=85) (actual time=532.547..697.630 rows=272646 loops=1)
Sort Key: labels.id
Sort Method: quicksort Memory: 61602kB
Buffers: shared hit=244174
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..198183.83 rows=268701 width=85) (actual time=0.085..372.800 rows=272646 loops=1)
Index Cond: (labels.group_id IS NOT NULL)
Buffers: shared hit=244174
-> Index Only Scan using index_lists_on_label_id on public.lists (cost=0.43..92690.01 rows=4006755 width=4) (actual time=0.035..15608.198 rows=1143492 loops=1)
Heap Fetches: 158470
Buffers: shared hit=150440 read=12538 dirtied=3556
I/O Timings: read=13718.747
-> Index Only Scan using index_label_priorities_on_label_id on public.label_priorities (cost=0.29..2744.01 rows=111573 width=4) (actual time=0.034..442.709 rows=111614 loops=1)
Heap Fetches: 20163
Buffers: shared hit=12404 read=529 dirtied=106
I/O Timings: read=306.859
-> Index Only Scan using index_board_labels_on_label_id on public.board_labels (cost=0.29..371.39 rows=13886 width=4) (actual time=0.027..89.002 rows=13894 loops=1)
Heap Fetches: 4634
Buffers: shared hit=2974 read=82 dirtied=26
I/O Timings: read=57.533
-> Index Only Scan using index_resource_label_events_on_label_id_and_action on public.resource_label_events (cost=0.56..5.59 rows=234 width=4) (actual time=0.553..0.553 rows=1 loops=155219)
Index Cond: (resource_label_events.label_id = labels.id)
Heap Fetches: 7598
Buffers: shared hit=655201 read=57692 dirtied=5601
I/O Timings: read=82023.486
-> Index Only Scan using index_label_links_on_label_id on public.label_links (cost=0.56..6.61 rows=282 width=4) (actual time=0.484..0.484 rows=0 loops=58026)
Index Cond: (label_links.label_id = labels.id)
Heap Fetches: 367
Buffers: shared hit=218503 read=19932 dirtied=24
I/O Timings: read=27522.183
Query string:
explain WITH data AS ( SELECT *, row_number() OVER (PARTITION BY group_id, title ORDER BY id) AS row_number FROM labels WHERE group_id IS NOT NULL ) SELECT * FROM data WHERE row_number > 1;
519 rows (partial + full duplicates)
CTE Scan on data (cost=228460.58..234506.35 rows=89567 width=1169) (actual time=16726.537..17527.727 rows=519 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 272127
Buffers: shared hit=239362 read=5810 dirtied=302
I/O Timings: read=7550.635
CTE data
-> WindowAgg (cost=222414.81..228460.58 rows=268701 width=93) (actual time=16714.142..17323.245 rows=272646 loops=1)
Buffers: shared hit=239362 read=5810 dirtied=302
I/O Timings: read=7550.635
-> Sort (cost=222414.81..223086.56 rows=268701 width=85) (actual time=16714.099..16751.519 rows=272646 loops=1)
Sort Key: labels.group_id, labels.title, labels.id
Sort Method: quicksort Memory: 61631kB
Buffers: shared hit=239362 read=5810 dirtied=302
I/O Timings: read=7550.635
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..198183.83 rows=268701 width=85) (actual time=0.022..16339.784 rows=272646 loops=1)
Index Cond: (labels.group_id IS NOT NULL)
Buffers: shared hit=239356 read=5810 dirtied=302
I/O Timings: read=7550.635
October 2020: 643 duplicates
CTE Scan on data (cost=251800.87..259317.31 rows=111355 width=1169) (actual time=102355.342..103135.825 rows=643 loops=1)
Filter: (data.row_number > 1)
Rows Removed by Filter: 336417
Buffers: shared hit=203606 read=99341 dirtied=378
I/O Timings: read=99830.471
CTE data
-> WindowAgg (cost=244284.43..251800.87 rows=334064 width=95) (actual time=102343.634..102871.166 rows=337060 loops=1)
Buffers: shared hit=203606 read=99341 dirtied=378
I/O Timings: read=99830.471
-> Sort (cost=244284.43..245119.59 rows=334064 width=87) (actual time=102343.562..102387.529 rows=337060 loops=1)
Sort Key: labels.group_id, labels.title, labels.id
Sort Method: quicksort Memory: 72330kB
Buffers: shared hit=203606 read=99341 dirtied=378
I/O Timings: read=99830.471
-> Index Scan using index_labels_on_group_id_and_project_id_and_title on public.labels (cost=0.56..213634.45 rows=334064 width=87) (actual time=9.228..101555.239 rows=337060 loops=1)
Index Cond: (labels.group_id IS NOT NULL)
Buffers: shared hit=203600 read=99341 dirtied=378
I/O Timings: read=99830.471
Edited by charlie ablett