Migrate epic mentions to epic_user_mentions
What does this MR do?
This MR contains migrations for epic and epic notes mentions.
These migrations are to be run after !19088 (merged) is merged and confirmed to be working fine on production
This MR has the migrations, the common code under CreateResourceUserMention
that would be used for further migration of other models mentions as well as isolated Mentionable
concern and Epic
and EpicUserMention
models.
CreateResourceUserMention
makes use of bulk insert feature to speed up inserts of new mentions.
I've pulled some estimate counts of the records we might be looking at from slack database-lab
channel.
Database benchmarks
Number of rows affected
Epic with mentions in description or title epic(~696), and mentions in epic notes(~8,532):
explain select * from epics where description like '%\@%'
Seq Scan on public.epics (cost=0.00..5731.54 rows=383 width=780) (actual time=3.214..724.875 rows=696 loops=1)
Filter: (epics.description ~~ '%\@%'::text)
Rows Removed by Filter: 12072
Buffers: shared dirtied=52 hit=1063 read=1881
explain select * from notes where noteable_type = 'Epic' and note like '%\@%'
Index Scan using index_notes_on_noteable_id_and_noteable_type on public.notes (cost=0.57..4105657.57 rows=4266 width=2455) (actual time=4.954..685897.784 rows=8532 loops=1)
Index Cond: ((notes.noteable_type)::text = 'Epic'::text)
Filter: (notes.note ~~ '%\@%'::text)
Rows Removed by Filter: 116200
Buffers: shared hit=774523 read=568275
Migrate epic mentions
Query plans for sql queries to migrate mentions for epic description and title based on database-lab
data.
-
Query 1: Get the start ID for the epics to be migrated. Plan: https://explain.depesz.com/s/K6KF
EXPLAIN SELECT "epics"."id" FROM "epics" LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id WHERE ((description like '%@%' OR title like '%@%') AND epic_user_mentions.epic_id is null) ORDER BY "epics"."id" ASC LIMIT 1
-
Query 2: Get max id for the 10K batch rows for epics to be migrated. Plan: https://explain.depesz.com/s/8IBp
EXPLAIN SELECT “epics”.“id” FROM “epics” LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id WHERE ((description like ‘%@%’ OR title like ‘%@%’) AND epic_user_mentions.epic_id is null) AND “epics”.“id” >= 1 ORDER BY “epics”.“id” ASC LIMIT 1 OFFSET 10000
-
Query 3: Get epics to be migrated between min id and max id. Plan: https://explain.depesz.com/s/VSrf
EXPLAIN SELECT MIN(epics.id), MAX(epics.id) FROM “epics” LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id WHERE ((description like ‘%@%’ OR title like ‘%@%’) AND epic_user_mentions.epic_id is null) AND “epics”.“id” >= 1
Migrate epic notes mentions
Query plans for sql queries to migrate mentions for epic notes based on database-lab
data.
-
On notes queries are very slow even when notes.id is between range of values. For instance bellow query only gets 66 matching records in a range of ~10M notes, and it takes ~15mins to execute: https://explain.depesz.com/s/zY31
explain SELECT "notes"."id" FROM "notes" LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id WHERE (note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' AND notes.system = false) AND notes.id >= 81310000 AND notes.id < 91320000 ORDER BY notes.id
-
Creating a temporary index helps quite a bit:
exec CREATE INDEX epic_mentions_temp_index ON notes (id) WHERE note ~~ ‘%@%’::text AND notes.noteable_type = ‘Epic’ AND notes.system = false The query has been executed. Duration: 37.239 min
\di+ epic_mentions_temp_index
List of relations Schema | Name | Type | Owner | Table | Size | Description --------+-----------------+-------+--------+-------+--------+------------- public | epic_mentions_temp_index | index | gitlab | notes | 216 kB | (1 row)
-
Afterwards the above query took only ~0.4ms to execute: https://explain.depesz.com/s/qel3
explain SELECT "notes"."id" FROM "notes" LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id WHERE (note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic' AND notes.system = false) AND notes.id >= 81310000 AND notes.id < 91320000 ORDER BY notes.id
-
We will need to drop the
epic_mentions_temp_index
index in a different release after background migrations are finished.
Migration Runtime estimates
~2mins
Epic title and description mentions:- Epics with mentions in title or description count: ~800 https://explain.depesz.com/s/spYu
EXPLAIN SELECT “epics”.“id” FROM “epics”
LEFT JOIN epic_user_mentions on epics.id = epic_user_mentions.epic_id
WHERE ((description like ‘%@%’ OR title like ‘%@%’) AND epic_user_mentions.epic_id is null) ORDER BY “epics”.“id”
(actual time=0.057..1994.760 rows=792 loops=1)
- Given count of epics in 1, all the queries from !22333 (comment 265837848) execute in ~2 secs
- Given count of epics in 1 a single background job will be ran, which locally for 10K records runs in ~2.5 minutes, so for this case it would be under 2 mins.
~60 mins
Epic notes mentions:- Concurrent Index creation: ~60 mins
- Epic notes with mentions count: ~9000 https://explain.depesz.com/s/B9m8
explain SELECT “notes”.“id” FROM “notes”
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id
WHERE (note LIKE ‘%@%’ AND epic_user_mentions.epic_id IS NULL
AND notes.noteable_type = ‘Epic’ AND notes.system = false) AND notes.id >= 1 ORDER BY notes.id
(actual time=15.051..16.518 rows=9276 loops=1)
- Given count of epic notes in 2, all the queries from !22333 (comment 265838096) would execute in 1-2 seconds.
- Given count of epic notes in 2 a single background job should be need that locally takes ~2.5 mins
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Closes #21801