feat: postgres indexer

What does this MR do and why?

Implements a PostgreSQL adapter to store code chunks in a PostgreSQL database. This provides an alternative to Elasticsearch and OpenSearch adapters while maintaining the same indexing functionality.

Changes

  • PostgreSQL Client (internal/mode/chunk/client/postgresql/postgresql.go): Manages database connections and operations
  • PostgreSQL Indexer (internal/mode/chunk/indexer/postgresql/indexer.go): Implements chunk indexing operations
  • Integration (internal/mode/chunk/chunk.go): Orchestrates the indexing workflow

Key Features

  • Batched Operations: Buffers chunks up to 1000 (configurable) before flushing to reduce transaction overhead
  • Partition Support: All operations filter by partition_id for data isolation between partitions
  • Orphan Cleanup: Automatically removes chunks from modified files that no longer contain those chunks
  • Incremental Reindexing: Supports reindexing flag workflow for efficient incremental updates
  • Transaction Safety: Uses transactions for atomic operations

Operations Implemented

  • Index: Upserts chunks with batching and orphan cleanup
  • DeletePaths: Removes all chunks for specified file paths
  • Delete: Removes all chunks for a project in a partition
  • ResolveReindexing: Completes incremental reindexing workflow
  • Flush: Executes buffered upsert operations in a transaction

The test mocks database calls since we'll have integration tests in rails where a database is already configured in CI.

How to set up and validate locally

docker run -p 5432:5432 --name pgvector17 -e POSTGRES_PASSWORD=password pgvector/pgvector:pg17
  • Create the vector extension
psql -h localhost -p 5432 -U postgres
CREATE EXTENSION vector;
  • Create a postgres connection
connection = Ai::ActiveContext::Connection.create!(
  name: "postgres",
  options: { host: 'localhost', port: 5432, username: 'postgres', password: 'password' },
  adapter_class: "ActiveContext::Databases::Postgresql::Adapter"
)
connection.activate!
  • Run migration worker on repeat
::Ai::ActiveContext::MigrationWorker.new.perform
  • Create enabled namespaces
Ai::ActiveContext::Code::SchedulingWorker.new.perform("create_enabled_namespace")
  • Trigger indexing for a project
::Ai::ActiveContext::Code::AdHocIndexingWorker.new.perform(1000000)
  • Note that the repo files were chunked and indexed
  • Update a file and note that the chunks are representative (orphaned data deleted)
  • Run the deleter and note that the chunks were deleted
Ai::ActiveContext::Code::Deleter.run!(Ai::ActiveContext::Code::Repository.find_by(project_id:
 1000000))
Edited by Madelein van Niekerk

Merge request reports

Loading