Skip to content

ActiveContext postgres indexer

What does this MR do and why?

Adds handling for bulk indexing (upsert and delete) for the postgres adapter.

Note: we have been holding off adding tests because this work evolves constantly and adding tests slows us down at this stage. We will be sure to go back and add thorough testing!

Query plans

Bulk upsert

We will upsert at most 1000 documents at a time.

Query for 2 documents:

INSERT INTO "gitlab_active_context_merge_requests" ("id","issue_id","namespace_id","traversal_ids","partition_id") VALUES ('11', 11, 2, '24-p2-', 0), ('10', 10, 2, '24-p2-', 0) ON CONFLICT ("id","partition_id") DO UPDATE SET "issue_id"=excluded."issue_id","namespace_id"=excluded."namespace_id","traversal_ids"=excluded."traversal_ids"

Query plans for bulk upserting 2 documents:

Bulk delete

We will delete at most 1000 documents at a time.

Query for deleting 2 records:

DELETE FROM "gitlab_active_context_merge_requests" WHERE "gitlab_active_context_merge_requests"."id" IN ('10', '11')

Query plan for deleting 2 records: https://explain.depesz.com/s/vFhs

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  • Run a postgres instance: docker run -p 5432:5432 --name pgvector17 -e POSTGRES_PASSWORD=password pgvector/pgvector:pg17
  • Add an initializer connecting to postgres
# frozen_string_literal: true

ActiveContext.configure do |config|
  config.enabled = true
  config.logger = ::Gitlab::Elasticsearch::Logger.build

  config.databases = {
    pg1: {
      adapter: 'ActiveContext::Databases::Postgresql::Adapter',
      options: { port: 5432, host: 'localhost', username: 'postgres', password: 'password' }
    }
  }
end
  • Create a reference class: ee/lib/ai/context/references/merge_request.rb
 frozen_string_literal: true

module Ai
  module Context
    module References
      class MergeRequest < ::ActiveContext::Reference
        def self.serialize(record)
          new(record.id).serialize
        end

        attr_reader :identifier

        def initialize(identifier)
          @identifier = identifier.to_i
        end

        def serialize
          self.class.join_delimited([identifier].compact)
        end

        def as_indexed_json
          {
            id: identifier,
            description: "description #{identifier} for merge request"
          }
        end

        def operation
          :upsert
        end

        def partition_name
          'merge_requests_0'
        end

        def partition_id
          0
        end
      end
    end
  end
end
  • Start a bulk processor
bulk_processor = ActiveContext::BulkProcessor.new
  • Add refs to the processor
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
  • Flush the processor and note that both the refs are returned as failed
bulk_processor.flush
  • View the logs. It should say that the relation doesn't exist. It should also show that two refs were submitted and 2 failed.
{"message":"bulk_submitted","meta.indexing.bulk_count":2,"meta.indexing.errors_count":2}
  • Now create the relation
CREATE TABLE "merge_requests_0" (id BIGINT PRIMARY KEY, description TEXT);
  • Try again and note that no failed refs are returned. The log should also show that 0 refs failed.
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
  • Verify that the docs exist in the table.
  • Run it again and see that no duplicates were added
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
  • Change the operation to delete in the Reference class, reload and run it again. Note that the records are deleted from the relation.
bulk_processor.process(Ai::Context::References::MergeRequest.new(1))
bulk_processor.process(Ai::Context::References::MergeRequest.new(2))
bulk_processor.flush
  • [Optional]: add another ref class without creating the relation. Process refs from both ref classes and note that we only fail the relations that have an error.

Related to #507975 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading