Improve developer performance testing/analysis capabilities for Elasticsearch

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem

There are often times our team needs to consider different performance optimizations in Elasticsearch indexing or querying. It's pretty tricky to do realistic performance testing locally as it really requires a large Elasticsearch instance (multiple large VMs) and large amounts of usage/load which requires the client to have a lot of CPU as well.

Ideally we'd be able to do something similar to the #database-lab but the more I think about it the less realistic I think such an approach could be for Elasticsearch due to some major differences:

  1. We care more about behaviour under load than individual query performance and query plan analysis in ES is not nearly as sophisticated as Postgres
  2. I don't think we'd be able to snapshot our entire ES cluster and create dedicated instances for testing against quickly like this since we need a multi-node setup, though perhaps there actually is an approach here I hadn't considered. Either way it runs today on Elastic Cloud so we don't even have low level disk access
  3. Elasticsearch doesn't really allow migrating data in the way Postgres does so you can't just quickly change index settings and test query performance, it often requires reindexing everything.

Solution

There is already significant investment and good tooling around load testing our Elasticsearch setup with gpt. It would be good if we could easily trigger gpt against branches and only run specific tests we care about to save time.

Additionally it would be good if we could use all the automated provisioning stuff we do for nightly builds etc. to create a full large GitLab setup and then allow the developer to access all this infrastructure and play around with things to quickly experiment with different settings where we don't need to recreate the infrastructure for every experiment.

TODO:

  1. Generic steps for forking, set CI variables, run pipeline
  2. Support teardown of environment
  3. Automate the SSH key creation?
  4. What about DNS? Can we automate it
  5. Is there a better base image for the ansible step?
Edited by 🤖 GitLab Bot 🤖