Skip to content

Migrate user mentions from epic and epic notes

Alexandru Croitor requested to merge 21801-remigrate-epic-user-mentions into master

What does this MR do?

  • Cleans epic_user_mentions records that have no actual mentions ~300 records on chat-ops-test.
explain DELETE FROM "epic_user_mentions" 
WHERE "epic_user_mentions"."mentioned_users_ids" IS NULL 
AND "epic_user_mentions"."mentioned_groups_ids" IS NULL 
AND "epic_user_mentions"."mentioned_projects_ids" IS NULL 
AND "epic_user_mentions"."id" >= 1

Click to see the plan.
ModifyTable on public.epic_user_mentions (cost=0.00..122.45 rows=221 width=6) (actual time=19.645..19.646 rows=0 loops=1) Buffers: shared hit=212 read=23 dirtied=24 I/O Timings: read=6.444 -> Seq Scan on public.epic_user_mentions (cost=0.00..122.45 rows=221 width=6) (actual time=0.010..18.989 rows=196 loops=1) Filter: ((epic_user_mentions.mentioned_users_ids IS NULL) AND (epic_user_mentions.mentioned_groups_ids IS NULL) AND (epic_user_mentions.mentioned_projects_ids IS NULL) AND (epic_user_mentions.id >= 1)) Rows Removed by Filter: 2268 Buffers: shared hit=1 read=22 dirtied=4 I/O Timings: read=6.410

Time: 19.822 ms

  • planning: 0.153 ms
  • execution: 19.669 ms
    • I/O read: 6.444 ms
    • I/O write: 0.000 ms

Shared buffers:

  • hits: 212 (~1.70 MiB) from the buffer pool
  • reads: 23 (~184.00 KiB) from the OS file cache, including disk I/O
  • dirtied: 24 (~192.00 KiB)
  • writes: 0

Database benchmarks

IMPORTANT NOTE: The bellow queries where ran on #chat-ops-test. Using #database-lab returned much higher execution times as DB decides not to use epic_mentions_temp_index 🤔

#databse-lab has been updated with latest production snapshot on 2020-02-20 ~16:30 UTC and seems to perform the same as #chat-ops-test with exiting index

Number of rows affected

Epic with possible mentions in description or title epic(~11)
  • Not migrated epics, i.e. no record in epic_user_mentions:
/chatops run explain SELECT epics.* FROM epics 
LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id 
WHERE ((description like %@% OR title like %@%) AND epic_user_mentions.epic_id is null)

Click to see the plan. Count: 11
Merge Anti Join (cost=0.57..2846.15 rows=557 width=752) (actual time=67.539..134.251 rows=11 loops=1) Merge Cond: (epics.id = epic_user_mentions.epic_id) Buffers: shared hit=16606 read=560 I/O Timings: read=44.368 -> Index Scan using epics_pkey on epics (cost=0.29..2759.11 rows=602 width=752) (actual time=0.103..131.348 rows=1027 loops=1) Filter: ((description ~~ '%@%'::text) OR ((title)::text ~~ '%@%'::text)) Rows Removed by Filter: 18001 Buffers: shared hit=16311 read=543 I/O Timings: read=44.200 -> Index Only Scan using epic_user_mentions_on_epic_id_and_note_id_index on epic_user_mentions (cost=0.28..77.60 rows=2824 width=4) (actual time=0.063..2.190 rows=2826 loops=1) Heap Fetches: 996 Buffers: shared hit=295 read=17 I/O Timings: read=0.168 Planning time: 9.818 ms Execution time: 134.338 ms

  • ~208 Epics that might need to be re-migrated, or maybe these are just epics that have @ in their title or description but do not have a valid user mentions. So basically any @abaracadabra, where abracadabra is not an actual username, group or project.
/chatops run explain select epic_user_mentions.id from epic_user_mentions 
inner join epics on epics.id = epic_user_mentions.epic_id 
where mentioned_users_ids is null and mentioned_projects_ids is null and mentioned_groups_ids is null and note_id is null

Click to see the plan. Count: 11
Nested Loop (cost=0.56..237.82 rows=132 width=8) (actual time=1.566..5.391 rows=208 loops=1) Buffers: shared hit=652 read=6 I/O Timings: read=2.135 -> Index Scan using epic_user_mentions_on_epic_id_index on epic_user_mentions (cost=0.28..67.24 rows=132 width=12) (actual time=1.521..3.070 rows=208 loops=1) Filter: ((mentioned_users_ids IS NULL) AND (mentioned_projects_ids IS NULL) AND (mentioned_groups_ids IS NULL)) Rows Removed by Filter: 899 Buffers: shared hit=211 read=6 I/O Timings: read=2.135 -> Index Only Scan using epics_pkey on epics (cost=0.29..1.28 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=208) Index Cond: (id = epic_user_mentions.epic_id) Heap Fetches: 20 Buffers: shared hit=441 Planning time: 2.744 ms Execution time: 5.452 ms

Epic notes with possible mentions in epic (~10071):
  • Not migrated epic notes, i.e. no record in epic_user_mentions:
/chatops run explain SELECT notes.id FROM notes 
INNER JOIN epics ON epics.id = notes.noteable_id 
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
WHERE note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic'

Click to see the plan. Count: 10071
Nested Loop (cost=0.85..3488.16 rows=1 width=4) (actual time=0.200..405.764 rows=10071 loops=1) Buffers: shared hit=30441 read=1388 I/O Timings: read=276.762 -> Merge Left Join (cost=0.56..3487.81 rows=1 width=8) (actual time=0.129..384.288 rows=10089 loops=1) Merge Cond: (notes.id = epic_user_mentions.note_id) Filter: (epic_user_mentions.epic_id IS NULL) Rows Removed by Filter: 1708 Buffers: shared hit=10058 read=1388 I/O Timings: read=276.762 -> Index Scan using epic_mentions_temp_index on notes (cost=0.29..3419.24 rows=5217 width=8) (actual time=0.061..377.240 rows=11797 loops=1) Buffers: shared hit=9764 read=1369 I/O Timings: read=274.216 -> Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions (cost=0.28..87.24 rows=2824 width=8) (actual time=0.065..3.663 rows=1717 loops=1) Buffers: shared hit=294 read=19 I/O Timings: read=2.546 -> Index Only Scan using epics_pkey on epics (cost=0.29..0.34 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=10089) Index Cond: (id = notes.noteable_id) Heap Fetches: 128 Buffers: shared hit=20383 Planning time: 10.419 ms Execution time: 406.590 ms

  • ~65 epic notes that might need to be re-migrated, or maybe these are just epic notes that have @ but do not have a valid user mention
/chatops run explain select epic_user_mentions.id from epic_user_mentions 
inner join epics on epics.id = epic_user_mentions.epic_id 
where mentioned_users_ids is null and mentioned_projects_ids is null and mentioned_groups_ids is null and note_id is NOT null

Click to see the plan. Count: 10071
Nested Loop (cost=0.56..314.61 rows=198 width=8) (actual time=0.029..0.719 rows=65 loops=1) Buffers: shared hit=444 -> Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions (cost=0.28..87.24 rows=198 width=12) (actual time=0.019..0.485 rows=65 loops=1) Filter: ((mentioned_users_ids IS NULL) AND (mentioned_projects_ids IS NULL) AND (mentioned_groups_ids IS NULL)) Rows Removed by Filter: 1653 Buffers: shared hit=313 -> Index Only Scan using epics_pkey on epics (cost=0.29..1.14 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=65) Index Cond: (id = epic_user_mentions.epic_id) Heap Fetches: 0 Buffers: shared hit=131 Planning time: 0.278 ms Execution time: 0.752 ms

Migrate epic mentions

Query plans for sql queries to migrate mentions for epic description and title based on database-lab data.

  • Query 1: Get the start ID for the epics to be migrated.

     /chatops run explain SELECT "epics"."id" FROM "epics" 
     LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id 
     WHERE ((description like '%@%' OR title like '%@%') AND epic_user_mentions.epic_id is null) 
     ORDER BY "epics"."id" ASC LIMIT 1

Click to see the plan. Time: 36.588 ms

Limit  (cost=0.57..5.68 rows=1 width=4) (actual time=36.539..36.540 rows=1 loops=1)
  Buffers: shared hit=4358
  ->  Merge Anti Join  (cost=0.57..2846.15 rows=557 width=4) (actual time=36.538..36.538 rows=1 loops=1)
        Merge Cond: (epics.id = epic_user_mentions.epic_id)
        Buffers: shared hit=4358
        ->  Index Scan using epics_pkey on epics  (cost=0.29..2759.11 rows=602 width=4) (actual time=0.101..36.072 rows=263 loops=1)
              Filter: ((description ~~ '%@%'::text) OR ((title)::text ~~ '%@%'::text))
              Rows Removed by Filter: 5974
              Buffers: shared hit=4334
        ->  Index Only Scan using epic_user_mentions_on_epic_id_and_note_id_index on epic_user_mentions  (cost=0.28..77.60 rows=2824 width=4) (actual time=0.038..0.318 rows=342 loops=1)
              Heap Fetches: 80
              Buffers: shared hit=24
Planning time: 6.328 ms
Execution time: 36.588 ms

  • Query 2: Get max id for the 10K batch rows for epics to be migrated.

    /chatops run explain SELECT  epics.id FROM epics 
    LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id 
    WHERE ((description like %@% OR title like %@%) AND epic_user_mentions.epic_id is null) AND epics.id >= 1 
    ORDER BY epics.id ASC LIMIT 1 OFFSET 10000

Click to see the plan. Time: 86.793 ms

Limit  (cost=2894.82..2900.02 rows=1 width=4) (actual time=86.746..86.746 rows=0 loops=1)
  Buffers: shared hit=17166
  ->  Merge Anti Join  (cost=0.57..2894.82 rows=557 width=4) (actual time=37.386..86.744 rows=11 loops=1)
        Merge Cond: (epics.id = epic_user_mentions.epic_id)
        Buffers: shared hit=17166
        ->  Index Scan using epics_pkey on epics  (cost=0.29..2807.78 rows=602 width=4) (actual time=0.100..84.898 rows=1027 loops=1)
              Index Cond: (id >= 1)
              Filter: ((description ~~ '%@%'::text) OR ((title)::text ~~ '%@%'::text))
              Rows Removed by Filter: 18001
              Buffers: shared hit=16854
        ->  Index Only Scan using epic_user_mentions_on_epic_id_and_note_id_index on epic_user_mentions  (cost=0.28..77.60 rows=2824 width=4) (actual time=0.040..1.176 rows=2826 loops=1)
              Heap Fetches: 996
              Buffers: shared hit=312
Planning time: 6.580 ms
Execution time: 86.793 ms

  • Query 3: Get min and max for epic mentions to be migrated.

    /chatops run explain SELECT MIN(epics.id), MAX(epics.id) FROM epics 
    LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id 
    WHERE ((description like %@% OR title like %@%) AND epic_user_mentions.epic_id is null) 
    AND epics.id >= 1

Click to see the plan. Time: 80.018 ms

Aggregate  (cost=2897.61..2897.62 rows=1 width=8) (actual time=79.946..79.946 rows=1 loops=1)
  Buffers: shared hit=17171
  ->  Merge Anti Join  (cost=0.57..2894.82 rows=557 width=4) (actual time=34.231..79.934 rows=11 loops=1)
        Merge Cond: (epics.id = epic_user_mentions.epic_id)
        Buffers: shared hit=17171
        ->  Index Scan using epics_pkey on epics  (cost=0.29..2807.78 rows=602 width=4) (actual time=0.104..78.317 rows=1027 loops=1)
              Index Cond: (id >= 1)
              Filter: ((description ~~ '%@%'::text) OR ((title)::text ~~ '%@%'::text))
              Rows Removed by Filter: 18001
              Buffers: shared hit=16859
        ->  Index Only Scan using epic_user_mentions_on_epic_id_and_note_id_index on epic_user_mentions  (cost=0.28..77.60 rows=2824 width=4) (actual time=0.040..1.140 rows=2826 loops=1)
              Heap Fetches: 996
              Buffers: shared hit=312
Planning time: 6.510 ms
Execution time: 80.018 ms

  • Query 4: Get actual epics data for given range of ids

    /chatops run explain SELECT epics.* FROM epics 
    LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id 
    WHERE ((description like %@% OR title like %@%) AND epic_user_mentions.epic_id is null) 
    AND epics.id >= 1 AND epics.id < 25000

Click to see the plan. Time: 79.220 ms

Merge Anti Join  (cost=0.57..2943.50 rows=557 width=752) (actual time=33.020..79.093 rows=11 loops=1)
  Merge Cond: (epics.id = epic_user_mentions.epic_id)
  Buffers: shared hit=17171
  ->  Index Scan using epics_pkey on epics  (cost=0.29..2856.46 rows=602 width=752) (actual time=0.090..77.526 rows=1027 loops=1)
        Index Cond: ((id >= 1) AND (id < 25000))
        Filter: ((description ~~ '%@%'::text) OR ((title)::text ~~ '%@%'::text))
        Rows Removed by Filter: 17990
        Buffers: shared hit=16859
  ->  Index Only Scan using epic_user_mentions_on_epic_id_and_note_id_index on epic_user_mentions  (cost=0.28..77.60 rows=2824 width=4) (actual time=0.037..1.100 rows=2825 loops=1)
        Heap Fetches: 996
        Buffers: shared hit=312
Planning time: 8.629 ms
Execution time: 79.220 ms

Migrate epic notes mentions

Query plans for sql queries to migrate mentions for epic description and title based on database-lab data.

  • Query 1: Get the start ID for the epics notes to be migrated.

    /chatops run explain SELECT notes.id FROM notes 
    INNER JOIN epics ON epics.id = notes.noteable_id 
    LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
    WHERE note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' 
    ORDER BY notes.id ASC limit 1

Click to see the plan. Time: ~10 ms

Limit  (cost=0.85..3488.16 rows=1 width=4) (actual time=0.179..0.180 rows=1 loops=1)
  Buffers: shared hit=9
  ->  Nested Loop  (cost=0.85..3488.16 rows=1 width=4) (actual time=0.179..0.179 rows=1 loops=1)
        Buffers: shared hit=9
        ->  Merge Left Join  (cost=0.56..3487.81 rows=1 width=8) (actual time=0.111..0.111 rows=1 loops=1)
              Merge Cond: (notes.id = epic_user_mentions.note_id)
              Filter: (epic_user_mentions.epic_id IS NULL)
              Buffers: shared hit=6
              ->  Index Scan using epic_mentions_temp_index on notes  (cost=0.29..3419.24 rows=5217 width=8) (actual time=0.054..0.054 rows=1 loops=1)
                    Buffers: shared hit=3
              ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..87.24 rows=2824 width=8) (actual time=0.056..0.056 rows=1 loops=1)
                    Buffers: shared hit=3
        ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.34 rows=1 width=4) (actual time=0.065..0.066 rows=1 loops=1)
              Index Cond: (id = notes.noteable_id)
              Heap Fetches: 0
              Buffers: shared hit=3
Planning time: 9.872 ms
Execution time: 0.250 ms

  • Query 2: Get max id for the 10K batch rows for epics to be migrated.

    /chatops run explain SELECT notes.id FROM notes 
    INNER JOIN epics ON epics.id = notes.noteable_id 
    LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
    WHERE note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' 
    AND notes.id >= 67972855 ORDER BY notes.id ASC LIMIT 1 OFFSET 10000

Click to see the plan. Time: 116.860 ms

Limit  (cost=2788.76..5576.67 rows=1 width=4) (actual time=116.787..116.789 rows=1 loops=1)
  Buffers: shared hit=31557 read=23
  I/O Timings: read=1.892
  ->  Nested Loop  (cost=0.85..2788.76 rows=1 width=4) (actual time=0.150..116.289 rows=10001 loops=1)
        Buffers: shared hit=31557 read=23
        I/O Timings: read=1.892
        ->  Merge Left Join  (cost=0.56..2788.40 rows=1 width=8) (actual time=0.089..96.255 rows=10019 loops=1)
              Merge Cond: (notes.id = epic_user_mentions.note_id)
              Filter: (epic_user_mentions.epic_id IS NULL)
              Rows Removed by Filter: 1160
              Buffers: shared hit=10714 read=23
              I/O Timings: read=1.892
              ->  Index Scan using epic_mentions_temp_index on notes  (cost=0.29..2722.53 rows=4139 width=8) (actual time=0.045..93.249 rows=11179 loops=1)
                    Index Cond: (id >= 67972855)
                    Buffers: shared hit=10527 read=23
                    I/O Timings: read=1.892
              ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..87.24 rows=2824 width=8) (actual time=0.042..0.561 rows=1166 loops=1)
                    Buffers: shared hit=187
        ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.35 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=10019)
              Index Cond: (id = notes.noteable_id)
              Heap Fetches: 728
              Buffers: shared hit=20843
Planning time: 9.479 ms
Execution time: 116.860 ms

  • Query 3: Get min and max for epic mentions to be migrated.

    /chatops run explain SELECT min(notes.id), max(notes.id) FROM notes 
    INNER JOIN epics ON epics.id = notes.noteable_id 
    LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
    WHERE note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' 
    AND notes.id >= 67972855

Click to see the plan. Time: 144.160 ms

Aggregate  (cost=2788.77..2788.78 rows=1 width=8) (actual time=144.071..144.071 rows=1 loops=1)
  Buffers: shared hit=32069 read=385
  I/O Timings: read=25.537
  ->  Nested Loop  (cost=0.85..2788.76 rows=1 width=4) (actual time=0.151..142.694 rows=10071 loops=1)
        Buffers: shared hit=32069 read=385
        I/O Timings: read=25.537
        ->  Merge Left Join  (cost=0.56..2788.40 rows=1 width=8) (actual time=0.092..123.096 rows=10089 loops=1)
              Merge Cond: (notes.id = epic_user_mentions.note_id)
              Filter: (epic_user_mentions.epic_id IS NULL)
              Rows Removed by Filter: 1709
              Buffers: shared hit=11063 read=385
              I/O Timings: read=25.537
              ->  Index Scan using epic_mentions_temp_index on notes  (cost=0.29..2722.53 rows=4139 width=8) (actual time=0.046..119.188 rows=11798 loops=1)
                    Index Cond: (id >= 67972855)
                    Buffers: shared hit=10751 read=384
                    I/O Timings: read=25.520
              ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..87.24 rows=2824 width=8) (actual time=0.044..1.211 rows=1718 loops=1)
                    Buffers: shared hit=312 read=1
                    I/O Timings: read=0.017
        ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.35 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=10089)
              Index Cond: (id = notes.noteable_id)
              Heap Fetches: 751
              Buffers: shared hit=21006
Planning time: 8.679 ms
Execution time: 144.160 ms

  • Query 4: Get actual epics data for given range of ids

    /chatops run explain SELECT notes.* FROM notes 
    INNER JOIN epics ON epics.id = notes.noteable_id 
    LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
    WHERE note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' 
    AND notes.id >= 67972855 AND notes < 268738059

Click to see the plan. Time: 104.306 ms

Nested Loop  (cost=0.85..2506.33 rows=1 width=2463) (actual time=0.152..103.623 rows=9981 loops=1)
  Buffers: shared hit=30388
  ->  Merge Left Join  (cost=0.56..2505.97 rows=1 width=2463) (actual time=0.089..83.359 rows=9999 loops=1)
        Merge Cond: (notes.id = epic_user_mentions.note_id)
        Filter: (epic_user_mentions.epic_id IS NULL)
        Rows Removed by Filter: 1
        Buffers: shared hit=9538
        ->  Index Scan using epic_mentions_temp_index on notes  (cost=0.29..2441.20 rows=3700 width=2463) (actual time=0.035..80.759 rows=10000 loops=1)
              Index Cond: ((id >= 67972855) AND (id < 268738059))
              Buffers: shared hit=9534
        ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..87.24 rows=2824 width=8) (actual time=0.053..0.059 rows=2 loops=1)
              Buffers: shared hit=4
  ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.36 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=9999)
        Index Cond: (id = notes.noteable_id)
        Heap Fetches: 775
        Buffers: shared hit=20850
Planning time: 3.101 ms
Execution time: 104.306 ms

Migration Runtime estimates:

Epic mentions migration
  • There are at the moment 11 + 208 records to be migration which means a single background job so it should take ~2-3mins given the delay of the scheduled job.
Epic notes mentions migration
  • There are at the moment ~10071 + 65 records, the index was already created, so it should take 2 background jobs with 10K batches which would take ~5-6mins to schedule and run.

Considerations

Faced an interesting situation while pulling the query plans from #databse-lab where query plans would show that database was using a different index than epic_mentions_temp_index which made the queries run much slower. This particular migration is not very big, but it is smth to watch out for even more when having the migration for issues or merge requests notes.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

re #21801 #198321

Edited by Coung Ngo

Merge request reports