Skip to content

Migrate merge requests mentions to DB table

Alexandru Croitor requested to merge 198338-migrate-mr-mentions-to-db-table into master

What does this MR do?

This MR contains migrations for merge request mentions found within merge request title or description

I've pulled some estimate counts of the records we might be looking at from slack database-lab channel.

Database benchmarks

  • Cleanup records with no actual mentions:

     explain delete from merge_request_user_mentions 
     where mentioned_users_ids IS NULL AND mentioned_groups_ids IS NULL AND mentioned_projects_ids IS NULL

    Click to see the plan. Time: 1.061s. https://explain.depesz.com/s/D1jw
    
    ModifyTable on public.merge_request_user_mentions  (cost=0.00..17105.61 rows=104795 width=6) (actual time=572.434..572.434 rows=0 loops=1)
       Buffers: shared hit=104326 read=3422 dirtied=2668
       I/O Timings: read=292.197
       ->  Seq Scan on public.merge_request_user_mentions  (cost=0.00..17105.61 rows=104795 width=6) (actual time=3.280..454.209 rows=102525 loops=1)
             Filter: ((merge_request_user_mentions.mentioned_users_ids IS NULL) AND (merge_request_user_mentions.mentioned_groups_ids IS NULL) AND (merge_request_user_mentions.mentioned_projects_ids IS NULL))
             Rows Removed by Filter: 240203
             Buffers: shared read=3421 dirtied=730
             I/O Timings: read=291.067
    Time: 1.061 s
    
    planning: 0.097 ms
    execution: 1.061 s
    
    I/O read: 428.308 ms
    I/O write: 0.000 ms
    
    
    
    Shared buffers:
    
    hits: 104278 (~814.70 MiB) from the buffer pool
    reads: 4893 (~38.20 MiB) from the OS file cache, including disk I/O
    dirtied: 2724 (~21.30 MiB)
    writes: 0

  • Create index

    exec CREATE INDEX merge_request_mentions_temp_index ON merge_requests (id) WHERE title ~~ '%@%'::text OR description ~~ '%@%'::text
    
    The query has been executed. Duration: 4.650 min

Number of rows affected

  • Merge requests with mentions in description or title ~991373. Important this query is not run during migration

    explain SELECT "merge_requests"."id" FROM "merge_requests"
    LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
    WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)

    Click to see the plan. Time: 11.598 s. https://explain.depesz.com/s/jDkY
    
     Merge Join  (cost=0.85..50211.96 rows=860453 width=4) (actual time=1.240..11499.663 rows=991373 loops=1)
       Buffers: shared hit=505240 read=27974 dirtied=14026
       I/O Timings: read=10062.394
       ->  Index Only Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.237..8211.461 rows=1022929 loops=1)
             Heap Fetches: 28901
             Buffers: shared hit=445976 read=25406 dirtied=13212
             I/O Timings: read=7510.676
       ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..11193.45 rows=342161 width=4) (actual time=1.000..2927.670 rows=342728 loops=1)
             Heap Fetches: 64342
             Buffers: shared hit=59264 read=2568 dirtied=814
             I/O Timings: read=2551.718
    Time: 11.598 s
    - planning: 10.174 ms
    - execution: 11.588 s
    - I/O read: 10.062 s
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 505240 (~3.90 GiB) from the buffer pool
    - reads: 27974 (~218.50 MiB) from the OS file cache, including disk I/O
    - dirtied: 14026 (~109.60 MiB)
    - writes: 0

Migrate merge request mentions

Query plans for sql queries to migrate mentions for merge requests based on database-lab data.

  • Query 1: Get the start ID for the merge requests to be migrated.

     explain SELECT "merge_requests"."id" FROM "merge_requests"
     LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
     WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
     ORDER BY "merge_requests"."id" ASC LIMIT 1

    Click to see the plan. Time: 5.322 ms. https://explain.depesz.com/s/9u67
    
    Limit  (cost=0.85..0.91 rows=1 width=4) (actual time=0.029..0.030 rows=1 loops=1)
         Buffers: shared hit=8
         ->  Merge Join  (cost=0.85..50211.96 rows=860453 width=4) (actual time=0.028..0.028 rows=1 loops=1)
               Buffers: shared hit=8
               ->  Index Only Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.017..0.017 rows=1 loops=1)
                     Heap Fetches: 0
                     Buffers: shared hit=4
               ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                     Heap Fetches: 0
                     Buffers: shared hit=4
    Time: 5.322 ms
    - planning: 5.267 ms
    - execution: 0.055 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 8 (~64.00 KiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

  • Query 2: Get max id for the 10K batch rows for merge requests to be migrated.

     explain SELECT "merge_requests"."id" FROM "merge_requests"
     LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
     WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
     ORDER BY "merge_requests"."id" ASC LIMIT 1 OFFSET 10000

    Click to see the plan. Time: 12.081 ms. https://explain.depesz.com/s/CQto
    
      Limit  (cost=584.39..584.45 rows=1 width=4) (actual time=6.726..6.726 rows=1 loops=1)
         Buffers: shared hit=3115
         ->  Merge Join  (cost=0.85..50211.96 rows=860453 width=4) (actual time=0.041..6.064 rows=10001 loops=1)
               Buffers: shared hit=3115
               ->  Index Only Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..35931.87 rows=864937 width=4) (actual time=0.025..4.099 rows=10002 loops=1)
                     Heap Fetches: 601
                     Buffers: shared hit=3111
               ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.013..0.014 rows=2 loops=1)
                     Heap Fetches: 0
                     Buffers: shared hit=4
    Time: 12.081 ms
    - planning: 5.317 ms
    - execution: 6.764 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 3115 (~24.30 MiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

     explain SELECT "merge_requests"."id" FROM "merge_requests"
     LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
     WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null) AND 
     merge_requests.id > 40100000 ORDER BY "merge_requests"."id" ASC LIMIT 1 OFFSET 10000

    Click to see the plan. Time: 12.321 ms. https://explain.depesz.com/s/MoIa
    
      Limit  (cost=1106.33..1106.44 rows=1 width=4) (actual time=7.312..7.312 rows=1 loops=1)
         Buffers: shared hit=4689
         ->  Merge Join  (cost=0.85..21622.69 rows=195587 width=4) (actual time=2.263..6.764 rows=10001 loops=1)
               Buffers: shared hit=4689
               ->  Index Only Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..9069.28 rows=196606 width=4) (actual time=0.016..2.762 rows=10022 loops=1)
                     Index Cond: (merge_requests.id > 40100000)
                     Heap Fetches: 173
                     Buffers: shared hit=4283
               ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..11193.45 rows=342161 width=4) (actual time=0.010..2.183 rows=2534 loops=1)
                     Heap Fetches: 395
                     Buffers: shared hit=406
    Time: 12.321 ms
    - planning: 4.979 ms
    - execution: 7.342 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 4689 (~36.60 MiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

  • Query 3: Get min and max ID for range of merge requests mentions to be migrated.

     explain SELECT min("merge_requests"."id"), max("merge_requests"."id") FROM "merge_requests"
     LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
     WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
     AND merge_requests.id > 40100000 AND merge_requests.id < 40939508

    Click to see the plan. Time: 64.459 ms. https://explain.depesz.com/s/e6rZ
    
      Nested Loop  (cost=0.85..9659.41 rows=14391 width=4) (actual time=0.026..56.184 rows=26416 loops=1)
         Buffers: shared hit=90756
         ->  Index Only Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..715.44 rows=14466 width=4) (actual time=0.017..8.561 rows=26454 loops=1)
               Index Cond: ((merge_requests.id > 40100000) AND (merge_requests.id < 40939508))
               Heap Fetches: 406
               Buffers: shared hit=11371
         ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..0.81 rows=2 width=4) (actual time=0.001..0.001 rows=0 loops=26454)
               Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id)
               Heap Fetches: 8
               Buffers: shared hit=79385
    Time: 64.459 ms
    - planning: 6.674 ms
    - execution: 57.785 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 90756 (~709.00 MiB) from the buffer pool
    - reads: 0 from the OS file cache, including disk I/O
    - dirtied: 0
    - writes: 0

  • Query 4: Get actual merge requests data for given range of ids

     explain SELECT "merge_requests"."*" FROM "merge_requests"
     LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
     WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
     AND merge_requests.id > 40100000 AND merge_requests.id < 40939508

    Click to see the plan. Time: 5.517 s. https://explain.depesz.com/s/ce1R
    
      Nested Loop  (cost=0.85..20256.82 rows=14391 width=719) (actual time=0.029..101.731 rows=26416 loops=1)
         Buffers: shared hit=105735
         ->  Index Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..11312.85 rows=14466 width=719) (actual time=0.019..48.470 rows=26454 loops=1)
               Index Cond: ((merge_requests.id > 40100000) AND (merge_requests.id < 40939508))
               Buffers: shared hit=26350
         ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..0.81 rows=2 width=4) (actual time=0.002..0.002 rows=0 loops=26454)
               Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id)
               Heap Fetches: 8
               Buffers: shared hit=79385
    Time: 5.517 s
    - planning: 6.349 ms
    - execution: 5.510 s
    - I/O read: 5.222 s
    - I/O write: 0.000 ms
    Shared buffers:
    - hits: 80777 (~631.10 MiB) from the buffer pool
    - reads: 20805 (~162.50 MiB) from the OS file cache, including disk I/O
    - dirtied: 237 (~1.90 MiB)
    - writes: 0

explain SELECT merge_requests.* FROM "merge_requests"
 LEFT JOIN merge_request_user_mentions on merge_requests.id = merge_request_user_mentions.merge_request_id
 WHERE ((description like '%@%' OR title like '%@%') AND merge_request_user_mentions.merge_request_id is null)
 AND merge_requests.id > 18100000 AND merge_requests.id < 18200000

Click to see the plan. Time: 2.334 s. https://explain.depesz.com/s/rxsf

Nested Loop Anti Join  (cost=0.85..2702.96 rows=1042 width=731) (actual time=8.782..2326.244 rows=1584 loops=1)
   Buffers: shared hit=4800 read=1465 dirtied=8
   I/O Timings: read=2281.984
   ->  Index Scan using merge_request_mentions_temp_index on public.merge_requests  (cost=0.42..1431.72 rows=1046 width=731) (actual time=8.748..2302.339 rows=1584 loops=1)
         Index Cond: ((merge_requests.id > 18100000) AND (merge_requests.id < 18200000))
         Buffers: shared hit=48 read=1465 dirtied=8
         I/O Timings: read=2281.984
   ->  Index Only Scan using merge_request_user_mentions_on_mr_id_and_note_id_index on public.merge_request_user_mentions  (cost=0.42..2.01 rows=2 width=4) (actual time=0.010..0.010 rows=0 loops=1584)
         Index Cond: (merge_request_user_mentions.merge_request_id = merge_requests.id)
         Heap Fetches: 0
         Buffers: shared hit=4752
Time: 2.334 s

planning: 6.568 ms
execution: 2.327 s

I/O read: 2.282 s
I/O write: 0.000 ms



Shared buffers:

hits: 4800 (~37.50 MiB) from the buffer pool
reads: 1465 (~11.40 MiB) from the OS file cache, including disk I/O
dirtied: 8 (~64.00 KiB)
writes: 0

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #198338

Edited by Coung Ngo

Merge request reports