Skip to content

Sanitize existing TODOs for confidential notes

Jan Provaznik requested to merge jp-clean-confnotes into master

Adds a migration for sanitizing TODOs for confidential notes: there might be small amount of existing TODOs which refer to a confidential note which is actually not accessible by TODO's user. Although we filter these out when displaying TODOs, we should still delete these.

This migration is difficult to isolate (as we check user's permission), in thread !87908 (comment 952459215) it was agreed this could be a post-migration.

Sample output:

# info about deleted todos is logged in application_json.log:
{"severity":"INFO","time":"2022-06-03T07:00:57.163Z","correlation_id":null,"message":"SanitizeConfidentialNoteTodos deleting invalid todo","attributes":{"id":15,"user_id":14,"project_id":8,"target_id":495,"target_type":"Issue","author_id":14,"action":4,"state":"pending","created_at":"2022-06-03T07:00:37.891Z","updated_at":"2022-06-03T07:00:37.891Z","note_id":1492,"commit_id":null,"group_id":null,"resolved_by_action":null}}

# no standard output:
$ bin/rails db:migrate
main: == 20220601145406 SanitizeConfidentialNoteTodos: migrating ====================
main: == 20220601145406 SanitizeConfidentialNoteTodos: migrated (2.5112s) ===========

$ bin/rails db:rollback:main
main: == 20220601145406 SanitizeConfidentialNoteTodos: reverting ====================
main: == 20220601145406 SanitizeConfidentialNoteTodos: reverted (0.0004s) ===========

migration DB query:

explain SELECT "notes"."id" FROM "notes" WHERE "notes"."confidential" = true AND "notes"."id" > 100 ORDER BY "notes"."id" ASC LIMIT 100;

 Limit  (cost=0.28..5.41 rows=100 width=4) (actual time=4.225..124.491 rows=100 loops=1)
   Buffers: shared hit=54 read=27
   I/O Timings: read=123.783 write=0.000
   ->  Index Only Scan using index_notes_on_id_where_confidential on public.notes  (cost=0.28..300.62 rows=5852 width=4) (actual time=4.221..124.442 rows=100 loops=1)
         Index Cond: (notes.id > 100)
         Heap Fetches: 3
         Buffers: shared hit=54 read=27
         I/O Timings: read=123.783 write=0.000


Getting Todos for a batch of notes would look like this:

explain SELECT "todos".* FROM "todos" WHERE "todos"."note_id" IN (SELECT "notes"."id" FROM "notes" WHERE "notes"."confidential" = true AND "notes"."id" > 100 ORDER BY "notes"."id" ASC LIMIT 100);

 Nested Loop  (cost=7.23..440.37 rows=311 width=107) (actual time=0.501..0.502 rows=0 loops=1)
   Buffers: shared hit=481
   I/O Timings: read=0.000 write=0.000
   ->  HashAggregate  (cost=6.66..7.66 rows=100 width=4) (actual time=0.243..0.262 rows=100 loops=1)
         Group Key: notes.id
         Buffers: shared hit=81
         I/O Timings: read=0.000 write=0.000
         ->  Limit  (cost=0.28..5.41 rows=100 width=4) (actual time=0.056..0.208 rows=100 loops=1)
               Buffers: shared hit=81
               I/O Timings: read=0.000 write=0.000
               ->  Index Only Scan using index_notes_on_id_where_confidential on public.notes  (cost=0.28..300.62 rows=5852 width=4) (actual time=0.054..0.191 rows=100 loops=1)
                     Index Cond: (notes.id > 100)
                     Heap Fetches: 3
                     Buffers: shared hit=81
                     I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_todos_on_note_id on public.todos  (cost=0.57..4.30 rows=3 width=107) (actual time=0.002..0.002 rows=0 loops=100)
         Index Cond: (todos.note_id = notes.id)
         Buffers: shared hit=400
         I/O Timings: read=0.000 write=0.000

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/359064

Edited by Jan Provaznik

Merge request reports