Skip to content

Optimize DB indexes for ES indexing of notes

Markus Koller requested to merge optimize-note-indexes into master

What does this MR do?

Ref: https://gitlab.com/gitlab-org/gitlab-ee/issues/12402

To index notes, we exclude system notes and use find_in_batches to load them in batches for submission to the ES bulk import API. These queries often result in DB timeouts because the usage of ORDER BY id results in the notes_pkey index being used.

This adds an optimized partial index, and removes the unused index index_notes_on_noteable_type which is already covered for our usage by the existing index_notes_on_noteable_id_and_noteable_type.

Newer versions of PostgreSQL (at least 11) are smarter about this and use index_notes_on_project_id_and_noteable_type instead, so we might be able to remove the partial index again in the future.

Database checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • [-] Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When removing columns, tables, indexes or other structures:

  • [-] Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

The new index was suggested by @ahegyi and confirmed by @NikolayS at https://gitlab.com/gitlab-org/gitlab-ee/issues/12402#note_203392157:

Confirming that the specialized index will help to speed up the original query (Slack link, internal):

-- CREATE INDEX test_index on notes (project_id, id) WHERE NOT system;

-- SELECT  "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 326281) ORDER BY "notes"."id" ASC LIMIT 1000

 Limit  (cost=0.56..1652.96 rows=1000 width=2436) (actual time=0.394..8.070 rows=1000 loops=1)
   Buffers: shared hit=990 read=7
   ->  Index Scan using test_index on public.notes  (cost=0.56..701237.46 rows=424377 width=2436) (actual time=0.392..7.963 rows=1000 loops=1)
         Index Cond: ((notes.project_id = 13083) AND (notes.id > 326281))
         Buffers: shared hit=990 read=7
 Planning time: 6.761 ms
 Execution time: 8.235 ms
 Total Cost: 701237.46
 Buffers Hit: 990
 Buffers Written: 0
 Buffers Read: 7
Edited by Markus Koller

Merge request reports