Skip to content

Migrate mentions to respective DB tables

Alexandru Croitor requested to merge 21801-migrate-mentions-to-db-tables into master

What does this MR do?

This MR contains a collection of migrations for mentions on various item: issues, epics, merge requests, notes.

This MR is one(most probably first) of a series of mentions migrations.

These migrations are to be run after !19088 (merged) is merged and confirmed to be working fine on production

This MR is mainly the implementation of the common code lib under CreateResourceUserMention that would be used for further migration of various chunks of mentions.

We plan on migrating mentions in chunks for the simple reason that the number of mentions that need to be migrated is very large, roughly 22M records in various tables.

CreateResourceUserMention makes use of bulk insert feature to speed up inserts of new mentions.

I've pulled some count from slack database-lab channel.

  1. All issues vs open issues that may have a mention in description: ~1,155,214 vs ~341,481
-- issues that possibly have a mention in description
explain select * from issues where description like '%\@%'
Seq Scan on issues  (cost=0.00..7324876.67 rows=1155214 width=775)
  Filter: (description ~~ '%\@%'::text)

-- Open issues that possibly have a mention in description
explain select * from issues where state_id = 1 and description like '%\@%'
Index Scan using idx_issues_on_state_id on issues  (cost=0.44..3330471.90 rows=341481 width=775)
  Index Cond: (state_id = 1)
  Filter: (description ~~ '%\@%'::text)
  1. All issue notes vs open issues notes that may have a mention: ~7,592,623 vs ~2,150,714
-- issue notes that have at least one mention.
explain select * from notes where noteable_type = 'Issue' and note like '%\@%'
Index Scan using index_notes_on_noteable_id_and_noteable_type on notes  (cost=0.57..38323797.08 rows=7592623 width=2455)
  Index Cond: ((noteable_type)::text = 'Issue'::text)
  Filter: (note ~~ '%\@%'::text)

explain select * from notes, issues where issues.id = notes.noteable_id AND issues.state_id = 1 AND noteable_type = 'Issue' and note like '%\@%'
Merge Join  (cost=86.98..33798499.89 rows=2150714 width=3230)
  Merge Cond: (notes.noteable_id = issues.id)
  ->  Index Scan using index_notes_on_noteable_id_and_noteable_type on notes  (cost=0.57..38323797.08 rows=7592623 width=2455)
        Index Cond: ((noteable_type)::text = 'Issue'::text)
        Filter: (note ~~ '%\@%'::text)
  1. All merge requests vs open merge requests that may have a mention in description: ~816,573 vs ~40,099
explain select * from merge_requests where description like '%\@%'
Seq Scan on merge_requests  (cost=0.00..8260111.89 rows=816573 width=713)
  Filter: (description ~~ '%\@%'::text)

explain select * from merge_requests where state_id = 1 and description like '%\@%'
Index Scan using idx_merge_requests_on_target_project_id_and_iid_opened on merge_requests  (cost=0.43..1920706.01 rows=40099 width=713)
  Filter: (description ~~ '%\@%'::text)
  1. All merge request notes vs open merge request notes that may have a mention: ~9,663,665 vs ~474,550
explain select * from notes, merge_requests where merge_requests.id = notes.noteable_id AND noteable_type = 'MergeRequest' and note like '%\@%'
Merge Join  (cost=34.51..48430159.16 rows=9663665 width=3168)
  Merge Cond: (notes.noteable_id = merge_requests.id)
  ->  Index Scan using index_notes_on_noteable_id_and_noteable_type on notes  (cost=0.57..42041324.54 rows=10084526 width=2455)
        Index Cond: ((noteable_type)::text = 'MergeRequest'::text)
        Filter: (note ~~ '%\@%'::text)

explain select * from notes, merge_requests where merge_requests.id = notes.noteable_id AND state_id =1 AND noteable_type = 'MergeRequest' and note like '%\@%'
Nested Loop  (cost=1.00..30561364.85 rows=474550 width=3168)
  ->  Index Scan using idx_merge_requests_on_target_project_id_and_iid_opened on merge_requests  (cost=0.43..1916559.80 rows=1658485 width=713)
  ->  Index Scan using index_notes_on_noteable_id_and_noteable_type on notes  (cost=0.57..17.25 rows=2 width=2455)
        Index Cond: ((noteable_id = merge_requests.id) AND ...
  1. Not much that we can do on commit notes though: ~770,166
explain select * from notes where noteable_type = 'Commit' and note like '%\@%'
Index Scan using index_notes_on_noteable_id_and_noteable_type on notes  (cost=0.57..14299350.09 rows=770166 width=2455)
  Index Cond: ((noteable_type)::text = 'Commit'::text)
  Filter: (note ~~ '%\@%'::text)
  1. Not too worried about epic(~696), epic notes(~8,532), snippet(~4,366), snippet notes(~2,257), design notes(~406) mentions:
explain select * from epics where description like '%\@%'
Seq Scan on public.epics  (cost=0.00..5731.54 rows=383 width=780) (actual time=3.214..724.875 rows=696 loops=1)
   Filter: (epics.description ~~ '%\@%'::text)
   Rows Removed by Filter: 12072
   Buffers: shared dirtied=52 hit=1063 read=1881

explain select * from notes where noteable_type = 'Epic' and note like '%\@%'
 Index Scan using index_notes_on_noteable_id_and_noteable_type on public.notes  (cost=0.57..4105657.57 rows=4266 width=2455) (actual time=4.954..685897.784 rows=8532 loops=1)
   Index Cond: ((notes.noteable_type)::text = 'Epic'::text)
   Filter: (notes.note ~~ '%\@%'::text)
   Rows Removed by Filter: 116200
   Buffers: shared hit=774523 read=568275


explain select * from snippets where description like '%\@%'
 Seq Scan on public.snippets  (cost=0.00..124116.62 rows=2044 width=1080) (actual time=23.523..18530.015 rows=4366 loops=1)
   Filter: (snippets.description ~~ '%\@%'::text)
   Rows Removed by Filter: 216908
   Buffers: shared dirtied=1176 hit=34223 read=45467

explain select * from notes where noteable_type = 'Snippet' and note like '%\@%'
Index Scan using index_notes_on_noteable_id_and_noteable_type on public.notes  (cost=0.57..4105657.57 rows=4266 width=2455) (actual time=2.047..617768.467 rows=2257 loops=1)
   Index Cond: ((notes.noteable_type)::text = 'Snippet'::text)
   Filter: (notes.note ~~ '%\@%'::text)
   Rows Removed by Filter: 20493
   Buffers: shared hit=766129 read=481730

explain select * from notes where noteable_type = 'DesignManagement::Design' and note like '%\@%'
 Index Scan using index_notes_on_noteable_id_and_noteable_type on public.notes  (cost=0.57..4105657.57 rows=4266 width=2455) (actual time=2.020..544961.579 rows=406 loops=1)
   Index Cond: ((notes.noteable_type)::text = 'DesignManagement::Design'::text)
   Filter: (notes.note ~~ '%\@%'::text)
   Rows Removed by Filter: 1125
   Buffers: shared hit=526138 read=698230 written=3

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #21801

Edited by Coung Ngo

Merge request reports