Updated Elasticsearch framework
Proposal
Bypass the current legacy indexing logic which uses proxies and setup a framework that makes it easy to add and maintain indices.
References
Every document is defined as a reference
that is an indexable entity and can be added to the queue by calling the usual .track!
method. We will define a new abstract class which is extended by new document types. This file has the basic idea.
Reference contract
Search::Elastic::Reference
# Search::Elastic::Reference
attr :database_record, :database_id
def self.deserialize(string)
if ref_klass(string)
ref_klass.instantiate(string)
else
Search::Elastic::Reference::Legacy.instantiate(string)
end
end
def self.preload_database_records(refs)
refs
end
def self.instantiate(string)
raise NotImplementedError
end
# find the class name of the reference. E.g. Search::Elastic::Reference::Vulnerability if the payload contains Vulnerability.
def self.ref_klass(string)
"#{self}::#{first part of delimited string}".safe_constantize
end
def identifier
raise NotImplementedError
end
def routing
nil
end
def as_indexed_json
raise NotImplementedError
end
def index_name
raise NotImplementedError
end
def serialize
raise NotImplementedError
end
Example reference class, e.g. for vulnerabilities
# Search::Elastic::Reference::Vulnerability
override :identifier
def identifier
end
override :routing
def routing
end
override :as_indexed_json
def as_indexed_json
end
override :index_name
def index_name
end
override :serialize
def serialize
end
def self.instantiate(string)
end
Concern for ActiveRecord records
# Search::Elastic::Reference::ActiveRecord
def self.klass
# find database klass from payload, for example Issue
end
def self.preload_database_records(refs)
ids = refs.map(&:database_id)
records = klass.id_in(ids).preload_indexing_data
records_by_id = records.index_by(&:id)
refs.each do |ref|
ref.database_record = records_by_id[ref.database_id.to_i]
end
end
def preload_indexing_data
raise NotImplementedError
end
Example ActiveRecord reference
# Search::Elastic::Reference::Issue
include Search::Elastic::Reference::ActiveRecord
def preload_indexing_data
...
end
Legacy reference that allows DocumentReference to continue working until we phase it out
# Search::Elastic::Reference::Legacy
Definitions per document type
Every document type has a SSOT class which defines everything needed to index and search that document type.
For example
# ee/lib/search/elastic/vulnerabilities.rb
module Search
module Elastic
class Vulnerabilities
def self.index_name
...
end
def self.mappings
...
end
def self.settings
...
end
def self.query_builder
...
end
end
end
We then update elastic/helper.rb
to use these SSOT classes to create the index and alias instead of the proxy classes.
New flow for adding a document type
- Create SSOT file containing mappings, index name, etc.
- Create new Reference class to define how the entity is indexed
- Migration to create the index
- Keep documents up to date (using callbacks, GitLab EventStore or similar)
- Backfill documents
- Write search query using a query builder (later iteration)
- Perform search
We also want to look at having a way to self-register document types like we have in some of our sidekiq middleware (pause control, concurrency limit) instead of keeping a list of defined document types like we have in ES_SEPARATE_CLASSES
.
Advantages
- Getting rid of the legacy code (proxies, multi proxies, etc.) that is hard to understand. Also removing gems we don't really need.
- We are not tied to ActiveRecord models like we currently are. We can index any data.
- Making it easy for our team and other teams to use Elasticsearch. During a pairing session with the vulnerability team, we got the data indexed and searchable within one hour.