Skip to content

Elasticsearch indexing jobs should not silently ignore database connection errors

First noticed in https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/9198#note_287821229

Problem

Basically whenever sidekiq loses a database connection the jobs will just immediately return success without doing anything and move on from that job which means we will lose all our indexing updates during a database outage. We would prefer they remain in the queue until the database connection is restored.

Technical details

The problematic workers are ElasticIndexerWorker and ElasticCommitIndexerWorker.

Right now the first line in these workers is return true unless Gitlab::CurrentSettings.elasticsearch_indexing?. This is problematic because Gitlab::CurrentSettings.elasticsearch_indexing? will just return nil when there is no connection:

[1] pry(main)> ::Gitlab::Database.cached_column_exists?(:application_settings, :elasticsearch_indexing)
FATAL:  terminating connection due to administrator command
ActiveRecord::StatementInvalid: PG::UnableToSend: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
[4] pry(main)> ::Gitlab::CurrentSettings.elasticsearch_indexing?
=> nil
[6] pry(main)> ::Gitlab::CurrentSettings.foo_bar
=> nil

Solution

We need to ensure these workers will error out just like any other if there is no database connection. We can either figure out if we do that by changing the behaviour of the Gitlab::CurrentSettings to raise exceptions. If that is too wide reaching and not desired for other reasons we will just want to implement another DB check ourselves somehow. But ideally we make Gitlab::CurrentSettings behave sensibly and raise exceptions.