ActiveContext: support multiple jsons per reference (!185781) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Adds support for having a list of indexed_jsons instead of a single hash. This is the requirement for implementing a one ref => many docs approach.

Changes in this MR

Adding ref_id and ref_version fields to all partitions (ES and postgres)
Changing reference from implementing as_indexed_json to as_indexed_jsons which is an array. This is wrapped in a new method jsons which adds ref_id and ref_version.
The unique identifier of a document is now "#{ref.identifier}:#{index}" -> _id field on ES and id field on postgres

Postgres indexer

During indexing:

Loop through jsons to build upsert operations
Add a delete operation to delete all documents with the current ref.identifier and different ref.ref_version

During deletion:

Add a delete operation to delete all documents with the current ref.identifier

Elastic indexer

During indexing:

We need to split the bulk processing into two steps: calling client.bulk for upsert operations and calling client.delete_by_query for deletes.
Loop through jsons to build upsert operations
Add a delete operation to delete all documents with the current ref.identifier and different ref.ref_version

During deletion:

Add a delete operation to delete all documents with the current ref.identifier

We collect the deletes and add them together into a single delete_by_query request with shoulds.

Other things:

Process errors correctly: look up failed references using identifiers
Created a shared_example for the ES and OS indexer specs

References

#523414 (comment 2407744898)

Screenshots or screen recordings

Elasticsearch / OpenSearch:

elastic_multiple_jsons

Postgres:

postgres_multiple_jsons

How to set up and validate locally

Follow the docs in https://gitlab.com/gitlab-org/gitlab/-/blob/b8692aa64a704102545a70e2c48564b9ecc90cb0/gems/gitlab-active-context/doc/usage.md

Elasticsearch/OpenSearch:

Connect ES/OS
Track some refs

Ai::Context::Collections::MergeRequest.track!(MergeRequest.take(2))

Execute the queues

ActiveContext.execute_all_queues!

See that there are now two documents in the index and it contains ref_id and ref_version
Now change as_indexed_jsons to have more than document per ref

def as_indexed_jsons
  [
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    },
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    }
  ]
end

Track and execute again
Note that there are now 4 docs and that the ref_version changed
Now change it back to having one doc per ref

def as_indexed_jsons
  [
    {
      issue_id: identifier,
      namespace_id: database_record.project.id,
      traversal_ids: database_record.project.elastic_namespace_ancestry
    }
  ]
end

Track and execute again
Note that there are 2 docs and the ref_version changed

Postgres:

Connect postgres
Repeat the steps from Elatic above

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #523414 (closed)

Edited Mar 28, 2025 by Madelein van Niekerk

ActiveContext: support multiple jsons per reference