Skip to content

Advanced Search: Set engine for OpenSearch indices

What does this MR do and why?

Before OpenSearch 2.18, the default engine for kNN was set to nmslib which is compatible with cosine similarity. In OpenSearch 2.18, the default changed to faiss and faiss engine doesn't support cosine similarity: https://github.com/opensearch-project/k-NN/pull/2221#discussion_r1811476776.

This causes customers running the latest OpenSearch version to not be able to create indices due the error:

Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Validation Failed: 1: \"hnsw\" with \"faiss\" configuration does not support space type: \"cosinesimil\".;"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [_doc]: Validation Failed: 1: \"hnsw\" with \"faiss\" configuration does not support space type: \"cosinesimil\".;","caused_by":{"type":"validation_exception","reason":"Validation Failed: 1: \"hnsw\" with \"faiss\" configuration does not support space type: \"cosinesimil\".;"}},"status":400}

In order to support all versions of OpenSearch, we need to explicitly set the engine.

Here is some guidance on choosing an engine:

In general, nmslib outperforms both faiss and Lucene on search. However, to optimize for indexing throughput, faiss is a good option. For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes.

https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/#recommendations-for-engines-and-cluster-node-sizing

We don't currently have a way to select index settings based on the expected number of vectors so nmslib seems the reasonable choice to cover most instances.

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Run OpenSearch 2.18.0, e.g. docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "DISABLE_SECURITY_PLUGIN=true" opensearchproject/opensearch:2.18.0
  2. Delete the workitems index: Gitlab::Elastic::Helper.default.client.indices.delete(index: "gitlab-development-work_items-...) (Note: this will delete your data)
  3. Create the workitems index: ::Gitlab::Elastic::Helper.default.create_standalone_indices(target_classes: [WorkItem])
  4. Note that it doesn't fail
  5. Running the same on master will cause the index creation to fail

Related to #508667 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading