elasticsearch pipeline does not start with elasticsearch on
What is the productivity problem to solve?
I believe that the elasticsearch CI pipeline starts without elasticsearch turned ON. I believe this because I've recently written two API only test suites that have a hardcoded sleep of 60 seconds in a before
block meant to be a waiting time for the elasticsearch ON settings to propagate through the system and execution times of these tests are always longer than 60 seconds (see final paragraph for link to test run). There is already an issue to be smarter about waiting, that is to check the logs via the API to query when the sidekiq workers have started so that we can be sure that new entries are indexed properly by elasticsearch (https://gitlab.com/gitlab-org/quality/team-tasks/issues/395). Edit: I know that the tests do not register elasticsearch as being ON because in the api_json.log we're making the put
API call to turn it on, which doesn't get called if the get
API call to settings determines that elasticsearch is ON:
{"time":"2020-03-02T14:34:04.425Z","severity":"INFO","duration":291.1,"db":10.66,"view":280.44,"status":200,"method":"PUT","path":"/api/v4/application/settings","params":[{"key":"private_token","value":"[FILTERED]"},{"key":"elasticsearch_search","value":"true"},{"key":"elasticsearch_indexing","value":"true"},{"key":"elasticsearch_url","value":"http://elastic68:9200"}],"host":"gitlab-elastic.test","remote_ip":"172.19.0.4, 127.0.0.1","ua":"rest-client/2.0.2 (linux-gnu x86_64) ruby/2.6.5p114","route":"/api/:version/application/settings","user_id":1,"username":"root","queue_duration":36.26,"correlation_id":"dQBvhwmPkN9"}
Even so, it's generally considered good automated test practice to not leave side effects of running the tests, that is, the state of the system under test both before and after the tests should be as similar as possible, thus, I've built in logic to my API only tests that check to see if elasticsearch is ON, and if it's not, turn it ON and wait. After the test, if the original state of elasticsearch was OFF, then turn it OFF again. What this leads to, in a system where the default state of elasticsearch is OFF, is a lot of turning elasticsearch ON, waiting, turning it OFF again, then going to a new test suite and repeating the ON-OFF cycle again. This is especially wasteful when we have no smart way of knowing when sidekiq jobs are running and just hard-waiting one minute in each test-suite's before
block.
However, if the pipeline started with elasticsearch ON (that is a GET
settings API call should return "elasticsearch_search":true
and "elasticsearch_indexing":true
) before starting the tests then we could avoid waiting in the before
block of each test suite.
The risk in this strategy is that we turn elasticsearch ON in the pipeline but do not wait long enough for the sidekiq workers to start, then begin the tests and new entries added by those tests are not indexed and then the tests will fail. For this reason, making this change may be blocked by the issue to be able to query the sidekiq logs https://gitlab.com/gitlab-org/quality/team-tasks/issues/395.
The motivation for making this change is that there are significant speed gains to be had. For example, the API only tests run locally on my machine in about 15 seconds (when the initial state of elasticsearch is ON), but in CI the advanced_global_advanced_syntax_search_spec
and elasticsearch_api_spec
are taking about 67 and 97 seconds, respectively, to execute (https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/456074103). Not having to wait during the before
block on these tests would speed them up to take about 1/3 of the current execution time.
Problem identification checklist
-
The root cause of the problem is identified. -
The surface of the problem is as small as possible.
What are the potential solutions?
-
Have
"elasticsearch_search":true
and"elasticsearch_indexing":true
set before test suites in the elasticsearch (or elasticsearch quarantine) job start executing. -
Don't do anything and have slower tests.
-
All potential solutions are listed. -
A solution has been chosen for the first iteration: PUT THE CHOSEN SOLUTION HERE
Who and when will the solution be implemented?
The solution should be implemented by whomever is in charge of setting up CI pipelines (figure out what group this is and insert them here).
Verify that the solution has improved the situation
This can easily be verified by examining the test suite execution time and comparing it to pre-change execution times, for example, this one: https://gitlab.com/gitlab-org/gitlab-qa/-/jobs/456074103
-
The solution improved the situation. - If yes, check this box and close the issue. Well done!
🎉 - Otherwise, create a new "Productivity Improvement" issue. You can re-use the description from this issue, but obviously another solution should be chosen this time.
- If yes, check this box and close the issue. Well done!