ActiveContext: support multiple jsons per reference

What does this MR do and why?

Adds support for having a list of indexed_jsons instead of a single hash. This is the requirement for implementing a one ref => many docs approach.

Changes in this MR

  • Adding ref_id and ref_version fields to all partitions (ES and postgres)
  • Changing reference from implementing as_indexed_json to as_indexed_jsons which is an array. This is wrapped in a new method jsons which adds ref_id and ref_version.
  • The unique identifier of a document is now "#{ref.identifier}:#{index}" -> _id field on ES and id field on postgres

Postgres indexer

During indexing:

  • Loop through jsons to build upsert operations
  • Add a delete operation to delete all documents with the current ref.identifier and different ref.ref_version

During deletion:

  • Add a delete operation to delete all documents with the current ref.identifier

Elastic indexer

During indexing:

  • We need to split the bulk processing into two steps: calling client.bulk for upsert operations and calling client.delete_by_query for deletes.
  • Loop through jsons to build upsert operations
  • Add a delete operation to delete all documents with the current ref.identifier and different ref.ref_version

During deletion:

  • Add a delete operation to delete all documents with the current ref.identifier

We collect the deletes and add them together into a single delete_by_query request with shoulds.

Other things:

  • Process errors correctly: look up failed references using identifiers
  • Created a shared_example for the ES and OS indexer specs

References

Screenshots or screen recordings

Elasticsearch / OpenSearch:

elastic_multiple_jsons

Postgres:

postgres_multiple_jsons

How to set up and validate locally

Elasticsearch/OpenSearch:

  • Connect ES/OS
  • Track some refs
Ai::Context::Collections::MergeRequest.track!(MergeRequest.take(2))
  • Execute the queues
ActiveContext.execute_all_queues!
  • See that there are now two documents in the index and it contains ref_id and ref_version
  • Now change as_indexed_jsons to have more than document per ref
def as_indexed_jsons
  [
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    },
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    }
  ]
end
  • Track and execute again
  • Note that there are now 4 docs and that the ref_version changed
  • Now change it back to having one doc per ref
def as_indexed_jsons
  [
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    }
  ]
end
  • Track and execute again
  • Note that there are 2 docs and the ref_version changed

Postgres:

  • Connect postgres
  • Repeat the steps from Elatic above

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #523414 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading