Elasticsearch indexing jobs should not silently ignore database connection errors
First noticed in https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/9198#note_287821229
Problem
Basically whenever sidekiq loses a database connection the jobs will just immediately return success without doing anything and move on from that job which means we will lose all our indexing updates during a database outage. We would prefer they remain in the queue until the database connection is restored.
Technical details
The problematic workers are ElasticIndexerWorker
and ElasticCommitIndexerWorker
.
Right now the first line in these workers is return true unless Gitlab::CurrentSettings.elasticsearch_indexing?
. This is problematic because Gitlab::CurrentSettings.elasticsearch_indexing?
will just return nil
when there is no connection:
[1] pry(main)> ::Gitlab::Database.cached_column_exists?(:application_settings, :elasticsearch_indexing)
FATAL: terminating connection due to administrator command
ActiveRecord::StatementInvalid: PG::UnableToSend: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[4] pry(main)> ::Gitlab::CurrentSettings.elasticsearch_indexing?
=> nil
[6] pry(main)> ::Gitlab::CurrentSettings.foo_bar
=> nil
Solution
We need to ensure these workers will error out just like any other if there is no database connection. We can either figure out if we do that by changing the behaviour of the Gitlab::CurrentSettings
to raise exceptions. If that is too wide reaching and not desired for other reasons we will just want to implement another DB check ourselves somehow. But ideally we make Gitlab::CurrentSettings
behave sensibly and raise exceptions.