elastic: note bulk importing often fails due to statement timeout
Noted while working on gitlab-com/gl-infra/production#800 (closed)
During initial index backfill for elasticsearch, we run code like
project.notes.searchable.es_import in the
Elastic::IndexRecordService. This loads the records in batches and inserts them into the ES index.
However, this is backed by quite an inefficient database query, and it fairly-often fails with the 5-second statement timeout enforced on GitLab.com
This makes it very difficult to backfill notes for projects with lots of notes, such as
Steps to reproduce
What is the current bug behavior?
: SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 326281) ORDER BY "notes"."id" ASC LIMIT 1000 irb(main):792:0> projects.notes.searchable.es_import Traceback (most recent call last): 11: from (irb):792 10: from (irb):792:in `rescue in irb_binding' 9: from ee/app/models/concerns/elastic/application_search.rb:331:in `es_import' 8: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:41:in `select_all' 7: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:62:in `read_using_load_balancer' 6: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:67:in `read_write' 5: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:118:in `retry_with_backoff' 4: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:68:in `block in read_write' 3: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:63:in `block in read_using_load_balancer' 2: from config/initializers/peek.rb:18:in `exec_params' 1: from config/initializers/peek.rb:18:in `async_exec_params' ActiveRecord::StatementInvalid (PG::QueryCanceled: ERROR: canceling statement due to statement timeout) : SELECT "notes".* FROM "notes" WHERE "notes"."project_id" = 13083 AND "notes"."system" = 'f' AND ("notes"."id" > 1053393) ORDER BY "notes"."id" ASC LIMIT 1000
What is the expected correct behavior?
Output of checks
This bug happens on GitLab.com
I think the issue is in the
AND system = 'f' part of the query. Filtering on a boolean field is generally hard to optimize.
We might want to consider adding a partial index to optimise exactly this query, although it's annoying since the index will only ever be used if elasticsearch is enabled.
Alternatively, we might want to consider moving system notes to a separate table altogether. They don't have all that much in common with real notes.