Sometimes search doesn't return results immediately after indexing should be complete

Summary

Intermittently, a search via Elasticsearch won't return a result for a query that should succeed (and does succeed several minutes later)

For example, after a git push, sidekiq logs show the relevant ElasticCommitIndexerWorker job completes, but a search for code included in the push doesn't return a result in the first minute despite a few attempts, but the same search made 10 minutes later succeeds.

Steps to reproduce

It might be difficult to reproduce reliably. But I've encountered the issue several times while trying to fix an Elasticsearch end-to-end test (!21984 (merged)).

I've reproduced it by doing this:

  1. Pull branch 43732-fix-elasticsearch-qa-test.
  2. Build a QA docker image: docker build -t gitlab/gitlab-ee-qa:nightly -f qa/Dockerfile ./
  3. Clone gitlab-qa: git clone https://gitlab.com/gitlab-org/gitlab-qa.git
  4. Comment out teardown in the gitlab and elasticsearch components so that the containers aren't removed after the tests complete.
  5. Run the elasticsearch tests via gitlab-qa using a seed that failed in CI: exe/gitlab-qa Test::Integration::Elasticsearch EE -- qa/specs/features/ee/browser_ui/enablement/elasticsearch/ --tag quarantine --seed 40173 (export an EE license first, e.g., export EE_LICENSE=$(cat /path/to/license))
  6. If tests pass, remove the containers that gitlab-qa started docker stop gitlab-elastic elastic68 && docker rm gitlab-elastic elastic68
  7. Repeat steps 5 and 6 until a test fails.

At that point you should have running GitLab and Elasticsearch containers in a state where some code was pushed but a search for the code failed.

What is the current bug behavior?

A code search doesn't return expected results until several minutes after the code was pushed.

What is the expected correct behavior?

The code search should return results immediately after indexing is complete.

Relevant logs and/or screenshots

The sidekiq logs do have entries for ElasticCommitIndexerWorker

{
  "severity": "INFO",
  "time": "2020-01-08T02:32:22.980Z",
  "class": "ElasticCommitIndexerWorker",
  "retry": 2,
  "queue": "elastic_commit_indexer",
  "jid": "adc7d3dd1d07fedf84557692",
  "created_at": "2020-01-08T02:32:22.979Z",
  "correlation_id": "6623a77a-7872-4e9f-ba72-da7d16c07642",
  "enqueued_at": "2020-01-08T02:32:22.979Z",
  "pid": 810,
  "message": "ElasticCommitIndexerWorker JID-adc7d3dd1d07fedf84557692: start",
  "job_status": "start",
  "scheduling_latency_s": 0.001849
}

{
  "severity": "INFO",
  "time": "2020-01-08T02:32:23.203Z",
  "class": "ElasticCommitIndexerWorker",
  "retry": 2,
  "queue": "elastic_commit_indexer",
  "jid": "adc7d3dd1d07fedf84557692",
  "created_at": "2020-01-08T02:32:22.979Z",
  "correlation_id": "6623a77a-7872-4e9f-ba72-da7d16c07642",
  "enqueued_at": "2020-01-08T02:32:22.979Z",
  "pid": 810,
  "message": "ElasticCommitIndexerWorker JID-adc7d3dd1d07fedf84557692: done: 0.222612 sec",
  "job_status": "done",
  "scheduling_latency_s": 0.001849,
  "rugged_calls": 2,
  "rugged_duration_ms": 7.69,
  "duration": 0.222612,
  "cpu_s": 0.028352,
  "completed_at": "2020-01-08T02:32:23.203Z"
}

And that job completed before the search that failed during the test:

{
  "correlation_id": "igSXRpnXfS3",
  "duration_ms": 123,
  "host": "gitlab-elastic.test",
  "level": "info",
  "method": "GET",
  "msg": "access",
  "proto": "HTTP/1.1",
  "referrer": "",
  "remote_addr": "127.0.0.1:0",
  "remote_ip": "127.0.0.1",
  "status": 200,
  "system": "http",
  "time": "2020-01-08T02:32:25Z",
  "uri": "/api/v4/search?scope=blobs\u0026search=elasticsearch:%20true\u0026private_token=[FILTERED]",
  "user_agent": "rest-client/2.0.2 (linux-gnu x86_64) ruby/2.6.3p62",
  "written_bytes": 2
}

The last failed search attempt made during the test was made under a minute later:

  "time": "2020-01-08T02:33:06Z",

But the search succeeded several minutes later when I tried it manually:

Started GET "/search?utf8=%E2%9C%93&search=elasticsearch%3A+true&group_id=&project_id=2&search_code=true&repository_ref=master&nav_source=navbar" for 172.22.0.1 at 2020-01-08 02:41:14 +0000
Processing by SearchController#show as HTML
  Parameters: {"utf8"=>"✓", "search"=>"elasticsearch: true", "group_id"=>"", "project_id"=>"2", "search_code"=>"true", "repository_ref"=>"master", "nav_source"=>"navbar"}
Completed 200 OK in 180ms (Views: 67.5ms | ActiveRecord: 55.8ms | Elasticsearch: 16.4ms)

(that was a UI search, but it worked via the API too)

Results of GitLab environment info

This was encountered using an omnibus-gitlab docker image (gitlab/gitlab-ee:nightly) and an elasticsearch docker image (docker.elastic.co/elasticsearch/elasticsearch:6.8.1)