Filter out notes with noteable_id pointing to inexistent records
What does this MR do?
This MR accounts for the fact that notes
table may contain orphaned rows that point to inexistent noteable_id
records for various models.
Notes table uses polymorphism to relate notes to different models: issues, merge requests, epics, designs, snippets, etc. through noteable_id
and noteable_type
. Because of polymorphism we cannot have a DB level FK, which inevitably leads to data inconsistency at DB level, which should be accounted for in code and migrations.
Here are the SQLs to prove for the data inconsistency:
The problem is due to polymorphism usage on notes
table, thus no FK on noteable_id
, thus inconsistency between notes.noteable_id
with noteable_type = 'Epic'
and epics.id
So we have this first query running in ~13ms and returning 10K record
/chatops explain SELECT notes.id FROM notes
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id
WHERE (note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic')
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id
Click to see the plan.
Merge Left Join (cost=0.56..186.17 rows=1 width=4) (actual time=0.161..19.886 rows=9999 loops=1)
Merge Cond: (notes.id = epic_user_mentions.note_id)
Filter: (epic_user_mentions.epic_id IS NULL)
Rows Removed by Filter: 1
Buffers: shared hit=6485 read=1
I/O Timings: read=0.029
-> Index Only Scan using epic_mentions_temp_index on notes (cost=0.29..139.85 rows=3158 width=4) (actual time=0.100..18.062 rows=10000 loops=1)
Index Cond: ((id >= 67972855) AND (id < 268738059))
Heap Fetches: 165
Buffers: shared hit=6481 read=1
I/O Timings: read=0.029
-> Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions (cost=0.28..63.77 rows=2227 width=8) (actual time=0.060..0.076 rows=2 loops=1)
Buffers: shared hit=4
Planning time: 7.781 ms
Execution time: 20.482 ms
And bellow query that also checks that noteable_id
is in epics
returns 9981 records
/chatops run explain select notes.id from notes
inner join epics on epics.id=notes.noteable_id where notes.id in
(SELECT “notes”.“id” FROM “notes”
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id
WHERE (note LIKE ‘%@%’ AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = ‘Epic’)
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id)
Click to see the execution plan.
Nested Loop (cost=362.74..365.80 rows=1 width=4) (actual time=10.530..132.265 rows=9981 loops=1)
Buffers: shared hit=76895
-> Nested Loop (cost=362.45..365.49 rows=1 width=8) (actual time=10.518..109.201 rows=9999 loops=1)
Buffers: shared hit=56429
-> HashAggregate (cost=361.88..361.89 rows=1 width=4) (actual time=10.500..15.131 rows=9999 loops=1)
Group Key: notes_1.id
Buffers: shared hit=6373
-> Merge Left Join (cost=0.56..361.87 rows=1 width=4) (actual time=0.038..7.285 rows=9999 loops=1)
Merge Cond: (notes_1.id = epic_user_mentions.note_id)
Filter: (epic_user_mentions.epic_id IS NULL)
Rows Removed by Filter: 1
Buffers: shared hit=6373
-> Index Only Scan using epic_mentions_temp_index on notes notes_1 (cost=0.29..302.19 rows=9914 width=4) (actual time=0.027..5.307 rows=10000 loops=1)
Index Cond: ((id >= 67972855) AND (id < 268738059))
Heap Fetches: 49
Buffers: shared hit=6369
-> Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions (cost=0.28..58.20 rows=2115 width=8) (actual time=0.010..0.013 rows=2 loops=1)
Buffers: shared hit=4
-> Index Scan using notes_pkey on notes (cost=0.57..3.59 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=9999)
Index Cond: (id = notes_1.id)
Buffers: shared hit=50056
-> Index Only Scan using epics_pkey on epics (cost=0.29..0.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=9999)
Index Cond: (id = notes.noteable_id)
Heap Fetches: 394
Buffers: shared hit=20466
Planning time: 3.561 ms
Execution time: 133.129 ms
OR using join query same 9981 records
/chatops run explain SELECT notes.id FROM notes
INNER JOIN epics ON epics.id = notes.noteable_id
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id
WHERE note LIKE ‘%@%’ AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = ‘Epic’
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id;
Click to see the execution plan.
Nested Loop (cost=0.85..2131.57 rows=1 width=4) (actual time=0.214..136.419 rows=9981 loops=1)
Buffers: shared hit=30658
-> Merge Left Join (cost=0.56..2131.14 rows=1 width=8) (actual time=0.146..112.357 rows=9999 loops=1)
Merge Cond: (notes.id = epic_user_mentions.note_id)
Filter: (epic_user_mentions.epic_id IS NULL)
Rows Removed by Filter: 1
Buffers: shared hit=9540
-> Index Scan using epic_mentions_temp_index on notes (cost=0.29..2084.82 rows=3156 width=8) (actual time=0.088..110.017 rows=10000 loops=1)
Index Cond: ((id >= 67972855) AND (id < 268738059))
Buffers: shared hit=9536
-> Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions (cost=0.28..63.77 rows=2227 width=8) (actual time=0.056..0.058 rows=2 loops=1)
Buffers: shared hit=4
-> Index Only Scan using epics_pkey on epics (cost=0.29..0.41 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=9999)
Index Cond: (id = notes.noteable_id)
Heap Fetches: 1043
Buffers: shared hit=21118
Planning time: 10.359 ms
Execution time: 137.206 ms
Given the above 2 queries, I've changed the code to be filtering notes that might have noteable_id
not pointing to existing resources anymore.
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team