Skip to content

Opensearch 3.0 compatibility

What does this MR do and why?

Change HNSW engine from nmslib to lucene for OpenSearch after version 2.1.0. This is to ensure compatibility with OpenSearch 3.0 which became unsupported due to nmslib going from being default to being deprecated.

OpenSearch version lucene supported? nmslib supported?
1.x - 2.1.0 No Yes
2.2 - 3 Yes Yes
3.x Yes No
  • Set engine to nmslib for versions <= 2.1.0
  • Set engine to lucene for versions > 2.1.0
  • Reindex when version is > 2.1.0 so that the engine changes to lucene

We choose lucene as the engine over faiss because faiss does not support filtering while searching, while lucene does and this is needed for the hybrid search feature - we need to apply filters during search and not post-filtering.

https://docs.opensearch.org/docs/latest/field-types/supported-field-types/knn-methods-engines/#engine-recommendations

Also introduces a reindex task for opensearch instances to make the switch to the lucene engine.

I tested the indexing + searching flow and that hybrid search continues to work when the new engine is used.

Screenshot_2025-05-22_at_08.39.08

Note

Embedding tracking is only enabled on .com currently, so no opensearch customers have existing embeddings and even if they did, the reindex would move embeddings to the new engine.

References

Latest OpenSearch fails during index creation d... (#540086 - closed)

How to set up and validate locally

Important

These steps are destructive as it deletes data in indices

  1. Run opensearch 3
  2. Checkout master
  3. Recreate all indices: Search::RakeTaskExecutorService.new(logger: ::Gitlab::Elasticsearch::Logger.build).execute(:recreate_index)
  4. Note index creation fails with nmslib engine is deprecated in OpenSearch
  5. Checkout 540086-change-hnsw-engine
  6. Recreate all indices: Search::RakeTaskExecutorService.new(logger: ::Gitlab::Elasticsearch::Logger.build).execute(:recreate_index)
  7. Note that the mappings for embedding_1 is lucene: "embedding_1"=>{"type"=>"knn_vector", "dimension"=>768, "method"=>{"engine"=>"lucene", "space_type"=>"cosinesimil", "name"=>"hnsw", "parameters"=>{"ef_construction"=>100, "m"=>16}}}

[Optional] Do the same for opensearch 1 and 2.

  1. Checkout master
  2. Recreate all indices: Search::RakeTaskExecutorService.new(logger: ::Gitlab::Elasticsearch::Logger.build).execute(:recreate_index)
  3. Checkout 540086-change-hnsw-engine
  4. Execute the migration worker and reindex worker on repeat: Elastic::MigrationWorker.new.perform and ElasticClusterReindexingCronWorker.new.perform
  5. Note that the mappings for embedding_1 is lucene: "embedding_1"=>{"type"=>"knn_vector", "dimension"=>768, "method"=>{"engine"=>"lucene", "space_type"=>"cosinesimil", "name"=>"hnsw", "parameters"=>{"ef_construction"=>100, "m"=>16}}}

[Optional] Track an embedding

  1. ::Search::Elastic::ProcessEmbeddingBookkeepingService.track_embedding!(WorkItem.first)
  2. ::Search::Elastic::ProcessEmbeddingBookkeepingService.new.execute

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #540086 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading