Optimize DB indexes for ES indexing of notes
What does this MR do?
Ref: https://gitlab.com/gitlab-org/gitlab-ee/issues/12402
To index notes, we exclude system notes and use find_in_batches
to
load them in batches for submission to the ES bulk import API.
These queries often result in DB timeouts because the usage of
ORDER BY id
results in the notes_pkey
index being used.
This adds an optimized partial index, and removes the unused index
index_notes_on_noteable_type
which is already covered for our
usage by the existing index_notes_on_noteable_id_and_noteable_type
.
Newer versions of PostgreSQL (at least 11) are smarter about this and
use index_notes_on_project_id_and_noteable_type
instead, so we might
be able to remove the partial index again in the future.
Database checklist
-
Conforms to the database guides
When adding migrations:
-
Updated db/schema.rb
-
Added a down
method so the migration can be reverted -
Added the output of the migration(s) to the MR body - [-] Added tests for the migration in
spec/migrations
if necessary (e.g. when migrating data)
When removing columns, tables, indexes or other structures:
- [-] Removed these in a post-deployment migration
-
Made sure the application no longer uses (or ignores) these structures
The new index was suggested by @ahegyi and confirmed by @NikolayS at https://gitlab.com/gitlab-org/gitlab-ee/issues/12402#note_203392157:
Confirming that the specialized index will help to speed up the original query (Slack link, internal):
-- CREATE INDEX test_index on notes (project_id, id) WHERE NOT system; -- SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 326281) ORDER BY "notes"."id" ASC LIMIT 1000 Limit (cost=0.56..1652.96 rows=1000 width=2436) (actual time=0.394..8.070 rows=1000 loops=1) Buffers: shared hit=990 read=7 -> Index Scan using test_index on public.notes (cost=0.56..701237.46 rows=424377 width=2436) (actual time=0.392..7.963 rows=1000 loops=1) Index Cond: ((notes.project_id = 13083) AND (notes.id > 326281)) Buffers: shared hit=990 read=7 Planning time: 6.761 ms Execution time: 8.235 ms Total Cost: 701237.46 Buffers Hit: 990 Buffers Written: 0 Buffers Read: 7