elastic: note bulk importing often fails due to statement timeout
Summary
Noted while working on gitlab-com/gl-infra/production#800 (closed)
During initial index backfill for elasticsearch, we run code like project.notes.searchable.es_import
in the Elastic::IndexRecordService
. This loads the records in batches and inserts them into the ES index.
However, this is backed by quite an inefficient database query, and it fairly-often fails with the 5-second statement timeout enforced on GitLab.com
This makes it very difficult to backfill notes for projects with lots of notes, such as gitaly
, gitlab-ce
, etc.
Steps to reproduce
Project.find_by_full_path("gitlab-org/gitlab-ce").notes.searchable.es_import
What is the current bug behavior?
: SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 326281) ORDER BY "notes"."id" ASC LIMIT 1000
irb(main):792:0> projects[207].notes.searchable.es_import
Traceback (most recent call last):
11: from (irb):792
10: from (irb):792:in `rescue in irb_binding'
9: from ee/app/models/concerns/elastic/application_search.rb:331:in `es_import'
8: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:41:in `select_all'
7: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:62:in `read_using_load_balancer'
6: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:67:in `read_write'
5: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:118:in `retry_with_backoff'
4: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:68:in `block in read_write'
3: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:63:in `block in read_using_load_balancer'
2: from config/initializers/peek.rb:18:in `exec_params'
1: from config/initializers/peek.rb:18:in `async_exec_params'
ActiveRecord::StatementInvalid (PG::QueryCanceled: ERROR: canceling statement due to statement timeout)
: SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 1053393) ORDER BY "notes"."id" ASC LIMIT 1000
What is the expected correct behavior?
0
Output of checks
This bug happens on GitLab.com
Possible fixes
I think the issue is in the AND system = 'f'
part of the query. Filtering on a boolean field is generally hard to optimize.
We might want to consider adding a partial index to optimise exactly this query, although it's annoying since the index will only ever be used if elasticsearch is enabled.
Alternatively, we might want to consider moving system notes to a separate table altogether. They don't have all that much in common with real notes.