Optimize DB indexes for ES indexing of notes
What does this MR do?
To index notes, we exclude system notes and use
load them in batches for submission to the ES bulk import API.
These queries often result in DB timeouts because the usage of
ORDER BY id results in the
notes_pkey index being used.
This adds an optimized partial index, and removes the unused index
index_notes_on_noteable_type which is already covered for our
usage by the existing
Newer versions of PostgreSQL (at least 11) are smarter about this and
index_notes_on_project_id_and_noteable_type instead, so we might
be able to remove the partial index again in the future.
- Conforms to the database guides
When adding migrations:
downmethod so the migration can be reverted
- Added the output of the migration(s) to the MR body
- [-] Added tests for the migration in
spec/migrationsif necessary (e.g. when migrating data)
When removing columns, tables, indexes or other structures:
- [-] Removed these in a post-deployment migration
- Made sure the application no longer uses (or ignores) these structures
The new index was suggested by @ahegyi and confirmed by @NikolayS at https://gitlab.com/gitlab-org/gitlab-ee/issues/12402#note_203392157:
Confirming that the specialized index will help to speed up the original query (Slack link, internal):
-- CREATE INDEX test_index on notes (project_id, id) WHERE NOT system; -- SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 326281) ORDER BY "notes"."id" ASC LIMIT 1000 Limit (cost=0.56..1652.96 rows=1000 width=2436) (actual time=0.394..8.070 rows=1000 loops=1) Buffers: shared hit=990 read=7 -> Index Scan using test_index on public.notes (cost=0.56..701237.46 rows=424377 width=2436) (actual time=0.392..7.963 rows=1000 loops=1) Index Cond: ((notes.project_id = 13083) AND (notes.id > 326281)) Buffers: shared hit=990 read=7 Planning time: 6.761 ms Execution time: 8.235 ms Total Cost: 701237.46 Buffers Hit: 990 Buffers Written: 0 Buffers Read: 7