ActiveContext postgres indexer (!180719) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

Adds handling for bulk indexing (upsert and delete) for the postgres adapter.

Note: we have been holding off adding tests because this work evolves constantly and adding tests slows us down at this stage. We will be sure to go back and add thorough testing!

Query plans

Bulk upsert

We will upsert at most 1000 documents at a time.

Query for 2 documents:

INSERT INTO "gitlab_active_context_merge_requests" ("id","issue_id","namespace_id","traversal_ids","partition_id") VALUES ('11', 11, 2, '24-p2-', 0), ('10', 10, 2, '24-p2-', 0) ON CONFLICT ("id","partition_id") DO UPDATE SET "issue_id"=excluded."issue_id","namespace_id"=excluded."namespace_id","traversal_ids"=excluded."traversal_ids"

Query plans for bulk upserting 2 documents:

no embeddings, no conflicts https://explain.depesz.com/s/b6bK
no embeddings, conflicts: https://explain.depesz.com/s/i6u5
embeddings, no conflicts: https://explain.depesz.com/s/sHkR
embeddings, conflicts: https://explain.depesz.com/s/mJom

Bulk delete

We will delete at most 1000 documents at a time.

Query for deleting 2 records:

DELETE FROM "gitlab_active_context_merge_requests" WHERE "gitlab_active_context_merge_requests"."id" IN ('10', '11')

Query plan for deleting 2 records: https://explain.depesz.com/s/vFhs

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

#507975 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Run a postgres instance: docker run -p 5432:5432 --name pgvector17 -e POSTGRES_PASSWORD=password pgvector/pgvector:pg17
Add an initializer connecting to postgres

# frozen_string_literal: true

ActiveContext.configure do |config|
  config.enabled = true
  config.logger = ::Gitlab::Elasticsearch::Logger.build

  config.databases = {
    pg1: {
      adapter: 'ActiveContext::Databases::Postgresql::Adapter',
      options: { port: 5432, host: 'localhost', username: 'postgres', password: 'password' }
    }
  }
end

Create a reference class: ee/lib/ai/context/references/merge_request.rb

 frozen_string_literal: true

module Ai
  module Context
    module References
      class MergeRequest < ::ActiveContext::Reference
        def self.serialize(record)
          new(record.id).serialize
        end

        attr_reader :identifier

        def initialize(identifier)
          @identifier = identifier.to_i
        end

        def serialize
          self.class.join_delimited([identifier].compact)
        end

        def as_indexed_json
          {
            id: identifier,
            description: "description #{identifier} for merge request"
          }
        end

        def operation
          :upsert
        end

        def partition_name
          'merge_requests_0'
        end

        def partition_id
          0
        end
      end
    end
  end
end

Start a bulk processor

bulk_processor = ActiveContext::BulkProcessor.new

Add refs to the processor

bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))

Flush the processor and note that both the refs are returned as failed

bulk_processor.flush

View the logs. It should say that the relation doesn't exist. It should also show that two refs were submitted and 2 failed.

{"message":"bulk_submitted","meta.indexing.bulk_count":2,"meta.indexing.errors_count":2}

Now create the relation

CREATE TABLE "merge_requests_0" (id BIGINT PRIMARY KEY, description TEXT);

Try again and note that no failed refs are returned. The log should also show that 0 refs failed.

bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush

Verify that the docs exist in the table.
Run it again and see that no duplicates were added

bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush

Change the operation to delete in the Reference class, reload and run it again. Note that the records are deleted from the relation.

bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush

[Optional]: add another ref class without creating the relation. Process refs from both ref classes and note that we only fail the relations that have an error.

Related to #507975 (closed)

Edited Feb 10, 2025 by Madelein van Niekerk

ActiveContext postgres indexer