Skip to content

Updated Elasticsearch framework

Proposal

Bypass the current legacy indexing logic which uses proxies and setup a framework that makes it easy to add and maintain indices.

References

Every document is defined as a reference that is an indexable entity and can be added to the queue by calling the usual .track! method. We will define a new abstract class which is extended by new document types. This file has the basic idea.

Reference contract

Search::Elastic::Reference
# Search::Elastic::Reference

attr :database_record, :database_id

def self.deserialize(string)
  if ref_klass(string)
    ref_klass.instantiate(string)
  else
    Search::Elastic::Reference::Legacy.instantiate(string)
  end
end

def self.preload_database_records(refs)
  refs
end

def self.instantiate(string)
  raise NotImplementedError
end

# find the class name of the reference. E.g. Search::Elastic::Reference::Vulnerability if the payload contains Vulnerability.

def self.ref_klass(string)
  "#{self}::#{first part of delimited string}".safe_constantize
end

def identifier
  raise NotImplementedError
end

def routing
  nil
end

def as_indexed_json
  raise NotImplementedError
end

def index_name
  raise NotImplementedError
end

def serialize
  raise NotImplementedError
end
Example reference class, e.g. for vulnerabilities
# Search::Elastic::Reference::Vulnerability

override :identifier
def identifier
end

override :routing
def routing
end

override :as_indexed_json
def as_indexed_json
end

override :index_name
def index_name
end

override :serialize
def serialize
end

def self.instantiate(string)
end
Concern for ActiveRecord records
# Search::Elastic::Reference::ActiveRecord

def self.klass
  # find database klass from payload, for example Issue
end

def self.preload_database_records(refs)
  ids = refs.map(&:database_id)

  records = klass.id_in(ids).preload_indexing_data
  records_by_id = records.index_by(&:id)

  refs.each do |ref|
    ref.database_record = records_by_id[ref.database_id.to_i]
  end
end

def preload_indexing_data
  raise NotImplementedError
end
Example ActiveRecord reference
# Search::Elastic::Reference::Issue

include Search::Elastic::Reference::ActiveRecord

def preload_indexing_data
  ...
end
Legacy reference that allows DocumentReference to continue working until we phase it out
# Search::Elastic::Reference::Legacy

Definitions per document type

Every document type has a SSOT class which defines everything needed to index and search that document type.

For example

# ee/lib/search/elastic/vulnerabilities.rb

module Search
  module Elastic
    class Vulnerabilities
      def self.index_name
        ...
      end

      def self.mappings
        ...
      end

      def self.settings
        ...
      end

      def self.query_builder
        ...
      end
  end
end

We then update elastic/helper.rb to use these SSOT classes to create the index and alias instead of the proxy classes.

New flow for adding a document type

  • Create SSOT file containing mappings, index name, etc.
  • Create new Reference class to define how the entity is indexed
  • Migration to create the index
  • Keep documents up to date (using callbacks, GitLab EventStore or similar)
  • Backfill documents
  • Write search query using a query builder (later iteration)
  • Perform search

We also want to look at having a way to self-register document types like we have in some of our sidekiq middleware (pause control, concurrency limit) instead of keeping a list of defined document types like we have in ES_SEPARATE_CLASSES.

Advantages

  • Getting rid of the legacy code (proxies, multi proxies, etc.) that is hard to understand. Also removing gems we don't really need.
  • We are not tied to ActiveRecord models like we currently are. We can index any data.
  • Making it easy for our team and other teams to use Elasticsearch. During a pairing session with the vulnerability team, we got the data indexed and searchable within one hour.
Edited by Madelein van Niekerk