Skip to content

Resolve "Flaky test `elastic_indexer_worker_spec.rb`"

What does this MR do?

Fixes the flaky spec found in #213740 (closed)

TL;DR Class level caching, classes being created in tests and then stored in a global registry of classes leads to bad results. The deleted test is messing with a class level cache and creating classes, the previous search method we switched out was using was looking through a global registry of classes.

The gritty details

This was the original failure we were trying to track down

6) ElasticIndexerWorker Indexing, updating, and deleting records type: :merge_request, name: "MergeRequest", attribute: :title deletes from index when an object is deleted
     Failure/Error:
       expect do
         subject.perform("delete", name, object.id, object.es_id, { 'es_parent' => object.es_parent })
         ensure_elasticsearch_index!
       end.to change { Elasticsearch::Model.search('*').total_count }.by(-1)
     NoMethodError:
       undefined method `model_name' for [PROXY] #<Class:0x000055800161e960>:Elasticsearch::Model::Proxy::ClassMethodsProxy

Tracing the code and looking at other test order combinations led us to notice this integration spec which does something a little funky to test the monkey patch of Elasticsearch::Model::Client.

Basically this test is creating some random subclasses of ::Elasticsearch::Model in order to test that all subclasses of ::Elasticsearch::Model shared the same globally cached client. This was the first red flag since a globally class level cached client could surely persist between tests. This allowed us to notice that a certain combination of Rspec tests doing:

$ bin/rspec ee/spec/elastic_integration/elasticsearch_model_client_spec.rb:11 ee/spec/workers/elastic_indexer_worker_spec.rb:63

Could lead to the error

NoMethodError:
            undefined method `info' for :fake_client:Symbol

which shows that this client was indeed being persisted between tests as expected by the monkey patch we added.

Now that wasn't actually the same problem happening in CI so we wondered why and we tried to recreate the CI problem by next removing the caching in the monkey patch and sure enough running the tests in this order did produce the error we saw in CI.

Next we then looked into the method Elasticsearch::Model#search which states By default, all models which include the Elasticsearch::Model module are searched so clearly it is searching through all classes that have subclassed Elasticsearch::Model using a registry of all such classes. Now this is bad because as you can tell in elasticsearch_model_client_spec we were subclassing this with just a raw Class.new which wasn't actually an ActiveRecord model and thus did not have the #model_name method and explains the original error.

So rolling back we decided to stop using Elasticsearch::Model#search in tests since this is never used in application code annyway and searching through class space in tests is just risky clearly.

Next we revealed another ordering problem with tests caching the client which is presumably only not happening in CI due to certain ordering causing something to bust the cache in between tests and is a ticking time bomb waiting to break future tests.

Given that this test was only testing a performance increase of a very tricky thing to test we decided to just remove it altogether.

Lastly we made these tests more robust by asserting the actual object being saved is found in searches to avoid false positives and make the intention clearer.

While fixing this I noticed that half of the behaviour being tested here is dead code so we should remove it all and I created #213947 (closed) as a follow up. I didn't want to do it now because that rabbit hole may be deeper than I thought.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #213740 (closed)

Edited by 🤖 GitLab Bot 🤖

Merge request reports