ActiveContext postgres indexer
What does this MR do and why?
Adds handling for bulk indexing (upsert and delete) for the postgres adapter.
Note: we have been holding off adding tests because this work evolves constantly and adding tests slows us down at this stage. We will be sure to go back and add thorough testing!
Query plans
Bulk upsert
We will upsert at most 1000 documents at a time.
Query for 2 documents:
INSERT INTO "gitlab_active_context_merge_requests" ("id","issue_id","namespace_id","traversal_ids","partition_id") VALUES ('11', 11, 2, '24-p2-', 0), ('10', 10, 2, '24-p2-', 0) ON CONFLICT ("id","partition_id") DO UPDATE SET "issue_id"=excluded."issue_id","namespace_id"=excluded."namespace_id","traversal_ids"=excluded."traversal_ids"
Query plans for bulk upserting 2 documents:
- no embeddings, no conflicts https://explain.depesz.com/s/b6bK
- no embeddings, conflicts: https://explain.depesz.com/s/i6u5
- embeddings, no conflicts: https://explain.depesz.com/s/sHkR
- embeddings, conflicts: https://explain.depesz.com/s/mJom
Bulk delete
We will delete at most 1000 documents at a time.
Query for deleting 2 records:
DELETE FROM "gitlab_active_context_merge_requests" WHERE "gitlab_active_context_merge_requests"."id" IN ('10', '11')
Query plan for deleting 2 records: https://explain.depesz.com/s/vFhs
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
- Run a postgres instance:
docker run -p 5432:5432 --name pgvector17 -e POSTGRES_PASSWORD=password pgvector/pgvector:pg17 - Add an initializer connecting to postgres
# frozen_string_literal: true
ActiveContext.configure do |config|
config.enabled = true
config.logger = ::Gitlab::Elasticsearch::Logger.build
config.databases = {
pg1: {
adapter: 'ActiveContext::Databases::Postgresql::Adapter',
options: { port: 5432, host: 'localhost', username: 'postgres', password: 'password' }
}
}
end
- Create a reference class:
ee/lib/ai/context/references/merge_request.rb
frozen_string_literal: true
module Ai
module Context
module References
class MergeRequest < ::ActiveContext::Reference
def self.serialize(record)
new(record.id).serialize
end
attr_reader :identifier
def initialize(identifier)
@identifier = identifier.to_i
end
def serialize
self.class.join_delimited([identifier].compact)
end
def as_indexed_json
{
id: identifier,
description: "description #{identifier} for merge request"
}
end
def operation
:upsert
end
def partition_name
'merge_requests_0'
end
def partition_id
0
end
end
end
end
end
- Start a bulk processor
bulk_processor = ActiveContext::BulkProcessor.new
- Add refs to the processor
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
- Flush the processor and note that both the refs are returned as failed
bulk_processor.flush
- View the logs. It should say that the relation doesn't exist. It should also show that two refs were submitted and 2 failed.
{"message":"bulk_submitted","meta.indexing.bulk_count":2,"meta.indexing.errors_count":2}
- Now create the relation
CREATE TABLE "merge_requests_0" (id BIGINT PRIMARY KEY, description TEXT);
- Try again and note that no failed refs are returned. The log should also show that 0 refs failed.
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
- Verify that the docs exist in the table.
- Run it again and see that no duplicates were added
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
- Change the operation to delete in the Reference class, reload and run it again. Note that the records are deleted from the relation.
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
- [Optional]: add another ref class without creating the relation. Process refs from both ref classes and note that we only fail the relations that have an error.
Related to #507975 (closed)