Skip to content

Filter out notes with noteable_id pointing to inexistent records

Alexandru Croitor requested to merge 201852-epic-mentions-migration-error into master

What does this MR do?

This MR accounts for the fact that notes table may contain orphaned rows that point to inexistent noteable_id records for various models.

Notes table uses polymorphism to relate notes to different models: issues, merge requests, epics, designs, snippets, etc. through noteable_id and noteable_type. Because of polymorphism we cannot have a DB level FK, which inevitably leads to data inconsistency at DB level, which should be accounted for in code and migrations.

Here are the SQLs to prove for the data inconsistency: The problem is due to polymorphism usage on notes table, thus no FK on noteable_id, thus inconsistency between notes.noteable_id with noteable_type = 'Epic' and epics.id

So we have this first query running in ~13ms and returning 10K record

/chatops explain SELECT notes.id FROM notes 
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
WHERE (note LIKE '%@%' AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = 'Epic') 
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id

Click to see the plan.

Merge Left Join  (cost=0.56..186.17 rows=1 width=4) (actual time=0.161..19.886 rows=9999 loops=1)
  Merge Cond: (notes.id = epic_user_mentions.note_id)
  Filter: (epic_user_mentions.epic_id IS NULL)
  Rows Removed by Filter: 1
  Buffers: shared hit=6485 read=1
  I/O Timings: read=0.029
  ->  Index Only Scan using epic_mentions_temp_index on notes  (cost=0.29..139.85 rows=3158 width=4) (actual time=0.100..18.062 rows=10000 loops=1)
        Index Cond: ((id >= 67972855) AND (id < 268738059))
        Heap Fetches: 165
        Buffers: shared hit=6481 read=1
        I/O Timings: read=0.029
  ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..63.77 rows=2227 width=8) (actual time=0.060..0.076 rows=2 loops=1)
        Buffers: shared hit=4
Planning time: 7.781 ms
Execution time: 20.482 ms

And bellow query that also checks that noteable_id is in epics returns 9981 records

/chatops run explain select notes.id from notes 
inner join epics on epics.id=notes.noteable_id where notes.id in 
(SELECT notes.id FROM notes 
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
WHERE (note LIKE %@% AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = Epic) 
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id)

Click to see the execution plan.

Nested Loop  (cost=362.74..365.80 rows=1 width=4) (actual time=10.530..132.265 rows=9981 loops=1)
  Buffers: shared hit=76895
  ->  Nested Loop  (cost=362.45..365.49 rows=1 width=8) (actual time=10.518..109.201 rows=9999 loops=1)
        Buffers: shared hit=56429
        ->  HashAggregate  (cost=361.88..361.89 rows=1 width=4) (actual time=10.500..15.131 rows=9999 loops=1)
              Group Key: notes_1.id
              Buffers: shared hit=6373
              ->  Merge Left Join  (cost=0.56..361.87 rows=1 width=4) (actual time=0.038..7.285 rows=9999 loops=1)
                    Merge Cond: (notes_1.id = epic_user_mentions.note_id)
                    Filter: (epic_user_mentions.epic_id IS NULL)
                    Rows Removed by Filter: 1
                    Buffers: shared hit=6373
                    ->  Index Only Scan using epic_mentions_temp_index on notes notes_1  (cost=0.29..302.19 rows=9914 width=4) (actual time=0.027..5.307 rows=10000 loops=1)
                          Index Cond: ((id >= 67972855) AND (id < 268738059))
                          Heap Fetches: 49
                          Buffers: shared hit=6369
                    ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..58.20 rows=2115 width=8) (actual time=0.010..0.013 rows=2 loops=1)
                          Buffers: shared hit=4
        ->  Index Scan using notes_pkey on notes  (cost=0.57..3.59 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=9999)
              Index Cond: (id = notes_1.id)
              Buffers: shared hit=50056
  ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=9999)
        Index Cond: (id = notes.noteable_id)
        Heap Fetches: 394
        Buffers: shared hit=20466
Planning time: 3.561 ms
Execution time: 133.129 ms

OR using join query same 9981 records

/chatops run explain SELECT notes.id FROM notes 
INNER JOIN epics ON epics.id = notes.noteable_id 
LEFT JOIN epic_user_mentions ON notes.id = epic_user_mentions.note_id 
WHERE note LIKE %@% AND epic_user_mentions.epic_id IS NULL AND notes.noteable_type = Epic 
AND notes.id >= 67972855 AND notes.id < 268738059 ORDER BY notes.id;

Click to see the execution plan.

Nested Loop  (cost=0.85..2131.57 rows=1 width=4) (actual time=0.214..136.419 rows=9981 loops=1)
  Buffers: shared hit=30658
  ->  Merge Left Join  (cost=0.56..2131.14 rows=1 width=8) (actual time=0.146..112.357 rows=9999 loops=1)
        Merge Cond: (notes.id = epic_user_mentions.note_id)
        Filter: (epic_user_mentions.epic_id IS NULL)
        Rows Removed by Filter: 1
        Buffers: shared hit=9540
        ->  Index Scan using epic_mentions_temp_index on notes  (cost=0.29..2084.82 rows=3156 width=8) (actual time=0.088..110.017 rows=10000 loops=1)
              Index Cond: ((id >= 67972855) AND (id < 268738059))
              Buffers: shared hit=9536
        ->  Index Scan using index_epic_user_mentions_on_note_id on epic_user_mentions  (cost=0.28..63.77 rows=2227 width=8) (actual time=0.056..0.058 rows=2 loops=1)
              Buffers: shared hit=4
  ->  Index Only Scan using epics_pkey on epics  (cost=0.29..0.41 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=9999)
        Index Cond: (id = notes.noteable_id)
        Heap Fetches: 1043
        Buffers: shared hit=21118
Planning time: 10.359 ms
Execution time: 137.206 ms

Given the above 2 queries, I've changed the code to be filtering notes that might have noteable_id not pointing to existing resources anymore.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

re #201852 (closed)

Edited by Alexandru Croitor

Merge request reports