Skip to content

Create abstraction layer to support Elasticsearch and OpenSearch

Since both OpenSearch and Elasticsearch will be supported for now, we want to create an abstraction layer which selects mappings, search code, etc. code based on whether ES or OS is used.

Solution validation

Where are there diverging paths between Elasticsearch and OpenSearch? Also between different versions of ES/OS.

  • Index creation
    • Specific mappings and/or settings in *Config class or Types:: class
  • Updating index
    • Changing mapping: requires an ES migration which can be skipped and have different mappings
  • Indexing documents (calling .track!)
    • as_indexed_json could be different
    • Sometimes track! should not be called if the index doesn't support a ref type
  • Searching
    • Search query could be different
  • Administration
    • Advanced Search admin page has different cluster connection options for OS vs. ES

So basically we have a few places that are likely to diverge:

  1. Mappings/settings during index creation
  2. Mapping updates in migrations
  3. as_indexed_json
  4. Search queries

And then there might be places in code where we need checks for the platform used.

What else needs to be done in order to upgrade/remove the ES gems?

  • TBD

How do we determine which path to serve?

The helper class has some methods around the platform used. For vectors we use Gitlab::Elastic::Helper.default.vectors_supported?(:elasticsearch) which is info[:distribution] == 'elasticsearch' && info[:version].to_f >= 8. Or we could use CurrentSettings.

How do we test on different versions and platforms?

QA tests. We run QA tests on different versions of OS and ES.

We also need to think about blobs/wikis. The json data is determined by the indexer so the indexer also would have diverging paths. We can pass extra options to the run command.

Implementation: inline if-else

Easiest would be to have a few methods in the ES helper similar to vectors_supported? (which should be cached for performance) and we call these methods whenever there is a divergence.

Click to expand for example index mapping
def self.mappings
  properties = {
    type: { type: 'keyword' },
    id: { type: 'integer' },
    ...
  }

  if helper.quantized_vectors_supported?(:elasticsearch)
    properties[:embedding] = {
      type: 'dense_vector',
      dims: 768,
      similarity: 'cosine',
      index: true,
      index_options: {
        type: 'int8_hnsw'
      }
    }
  elsif helper.vectors_supported?(:elasticsearch)
    properties[:embedding] = {
      type: 'dense_vector',
      dims: 768,
      similarity: 'cosine',
      index: true
    }
  elsif helper.vectors_supported?(:opensearch)
    properties[:embedding] = {
      type: 'knn_vector',
      dimension: 768,
      method: {
        name: 'hnsw'
      }
    }
  end

  {
    dynamic: 'strict',
    properties: properties
  }
end
Click to expand for example mapping migration
class AddEmbeddingToIssues < Elastic::Migration
  include Elastic::MigrationUpdateMappingsHelper

  skip_if -> { !Gitlab::Elastic::Helper.default.vectors_supported? }

  DOCUMENT_TYPE = Issue

  private

  def new_mappings
    if helper.quantized_vectors_supported?(:elasticsearch)
      {
          embedding_2: {
          type: 'dense_vector',
          dims: 768,
          similarity: 'cosine',
          index: true,
          index_options: {
            type: 'int8_hnsw'
          }
        }
      }
    elsif helper.vectors_supported?(:elasticsearch)
      {
        embedding_0: {
          type: 'dense_vector',
          dims: 768,
          similarity: 'cosine',
          index: true
        }
      }
    else
      {
        embedding_1: {
          type: 'knn_vector',
          dimension: 768,
          method: {
            name: 'hnsw'
          }
        }
      }
    end
  end
end

Note that every different model/dimension/vector type has a different field name. This is in accordance to #471983 (closed).

Click to expand for example `as_indexed_json`
def as_indexed_json
  data = {
    routing: routing
  }

  if helper.quantized_vectors_supported?(:elasticsearch)
    data["embedding_#{EmbeddingVersion.active.for_type(:elasticsearch, :quantized).id}"] = embedding
  elsif helper.vectors_supported?(:elasticsearch)
    data["embedding_#{EmbeddingVersion.active.for_type(:elasticsearch).id}"] = embedding
  elsif helper.vectors_supported?(:opensearch)
    data["embedding_#{EmbeddingVersion.active.for_type(:opensearch).id}"] = embedding
  end

  data
end

Con: we need to continue supporting older versions so the if statement will continue to grow until we decide to remove support for a version.

Also create an Architecture Design Document.

Edited by Madelein van Niekerk