feat: postgres indexer
What does this MR do and why?
Implements a PostgreSQL adapter to store code chunks in a PostgreSQL database. This provides an alternative to Elasticsearch and OpenSearch adapters while maintaining the same indexing functionality.
Changes
-
PostgreSQL Client (
internal/mode/chunk/client/postgresql/postgresql.go): Manages database connections and operations -
PostgreSQL Indexer (
internal/mode/chunk/indexer/postgresql/indexer.go): Implements chunk indexing operations -
Integration (
internal/mode/chunk/chunk.go): Orchestrates the indexing workflow
Key Features
- Batched Operations: Buffers chunks up to 1000 (configurable) before flushing to reduce transaction overhead
-
Partition Support: All operations filter by
partition_idfor data isolation between partitions - Orphan Cleanup: Automatically removes chunks from modified files that no longer contain those chunks
-
Incremental Reindexing: Supports
reindexingflag workflow for efficient incremental updates - Transaction Safety: Uses transactions for atomic operations
Operations Implemented
- Index: Upserts chunks with batching and orphan cleanup
- DeletePaths: Removes all chunks for specified file paths
- Delete: Removes all chunks for a project in a partition
- ResolveReindexing: Completes incremental reindexing workflow
- Flush: Executes buffered upsert operations in a transaction
The test mocks database calls since we'll have integration tests in rails where a database is already configured in CI.
How to set up and validate locally
- Checkout ActiveContext Postgres indexer support (gitlab!216987) branch on rails
- Checkout this branch on the indexer and run
make - Run postgres
docker run -p 5432:5432 --name pgvector17 -e POSTGRES_PASSWORD=password pgvector/pgvector:pg17
- Create the vector extension
psql -h localhost -p 5432 -U postgres
CREATE EXTENSION vector;
- Create a postgres connection
connection = Ai::ActiveContext::Connection.create!(
name: "postgres",
options: { host: 'localhost', port: 5432, username: 'postgres', password: 'password' },
adapter_class: "ActiveContext::Databases::Postgresql::Adapter"
)
connection.activate!
- Run migration worker on repeat
::Ai::ActiveContext::MigrationWorker.new.perform
- Create enabled namespaces
Ai::ActiveContext::Code::SchedulingWorker.new.perform("create_enabled_namespace")
- Trigger indexing for a project
::Ai::ActiveContext::Code::AdHocIndexingWorker.new.perform(1000000)
- Note that the repo files were chunked and indexed
- Update a file and note that the chunks are representative (orphaned data deleted)
- Run the deleter and note that the chunks were deleted
Ai::ActiveContext::Code::Deleter.run!(Ai::ActiveContext::Code::Repository.find_by(project_id:
1000000))
Edited by Madelein van Niekerk