ActiveContext: support multiple jsons per reference
What does this MR do and why?
Adds support for having a list of indexed_jsons instead of a single hash. This is the requirement for implementing a one ref => many docs approach.
Changes in this MR
- Adding
ref_idandref_versionfields to all partitions (ES and postgres) - Changing reference from implementing
as_indexed_jsontoas_indexed_jsonswhich is an array. This is wrapped in a new methodjsonswhich addsref_idandref_version. - The unique identifier of a document is now
"#{ref.identifier}:#{index}"->_idfield on ES andidfield on postgres
Postgres indexer
During indexing:
- Loop through
jsonsto build upsert operations - Add a delete operation to delete all documents with the current
ref.identifierand differentref.ref_version
During deletion:
- Add a delete operation to delete all documents with the current
ref.identifier
Elastic indexer
During indexing:
- We need to split the bulk processing into two steps: calling
client.bulkfor upsert operations and callingclient.delete_by_queryfor deletes. - Loop through
jsonsto build upsert operations - Add a delete operation to delete all documents with the current
ref.identifierand differentref.ref_version
During deletion:
- Add a delete operation to delete all documents with the current
ref.identifier
We collect the deletes and add them together into a single delete_by_query request with shoulds.
Other things:
- Process errors correctly: look up failed references using identifiers
- Created a
shared_examplefor the ES and OS indexer specs
References
Screenshots or screen recordings
Elasticsearch / OpenSearch:
Postgres:
How to set up and validate locally
- Follow the docs in https://gitlab.com/gitlab-org/gitlab/-/blob/b8692aa64a704102545a70e2c48564b9ecc90cb0/gems/gitlab-active-context/doc/usage.md
Elasticsearch/OpenSearch:
- Connect ES/OS
- Track some refs
Ai::Context::Collections::MergeRequest.track!(MergeRequest.take(2))
- Execute the queues
ActiveContext.execute_all_queues!
- See that there are now two documents in the index and it contains
ref_idandref_version - Now change
as_indexed_jsonsto have more than document per ref
def as_indexed_jsons
[
{
issue_id: identifier,
namespace_id: database_record.project.id,
traversal_ids: database_record.project.elastic_namespace_ancestry
},
{
issue_id: identifier,
namespace_id: database_record.project.id,
traversal_ids: database_record.project.elastic_namespace_ancestry
}
]
end
- Track and execute again
- Note that there are now 4 docs and that the
ref_versionchanged - Now change it back to having one doc per ref
def as_indexed_jsons
[
{
issue_id: identifier,
namespace_id: database_record.project.id,
traversal_ids: database_record.project.elastic_namespace_ancestry
}
]
end
- Track and execute again
- Note that there are 2 docs and the
ref_versionchanged
Postgres:
- Connect postgres
- Repeat the steps from Elatic above
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #523414 (closed)
Edited by Madelein van Niekerk