Vector store comparison

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Purpose

Keep a record of known advantages and disadvantages of evaluated vector stores.

Store	Project filter kNN duration*	Group filter kNN duration*	Advantages	Disadvantages	References
Elasticsearch and OpenSearch	12.0 ms	14.6 ms	Supports filters (NB for anything that's not public data) Existing code path to efficiently index and search vectors at scale in production. Faster query time than pgvector. Supports hybrid search: KNN + keyword search.	Not all GitLab customers run Elasticsearch: requires additional cost, maintenance and legal approval.	Benchmark
PGVector	517.1 ms	521.3	All GitLab customers run postgres already. Fast and accurate search for small datasets without using an index. The maximum documents before exceeding several seconds is: 50k documents for 3K ref architecture 1M documents for 50K ref architecture	Does not support filters when using an index. Recall and accuracy with filters using an index Dead tuples Without an index, on the largest gitlab ref architecture (50K) it takes 4 minutes to search through 20 million records and 6 seconds to search through 5 million records. Smaller ref architectures take significantly longer. Link Write throughput is too slow for production use when using an index Link Does not support hybrid search when an HNSW index is present	Benchmark PGVector issues

Store

Project filter kNN duration*

Group filter kNN duration*

Advantages

Disadvantages

References

Elasticsearch and OpenSearch

12.0 ms

14.6 ms

Supports filters (NB for anything that's not public data)
Existing code path to efficiently index and search vectors at scale in production.
Faster query time than pgvector.
Supports hybrid search: KNN + keyword search.

Not all GitLab customers run Elasticsearch: requires additional cost, maintenance and legal approval.

PGVector

517.1 ms

521.3

All GitLab customers run postgres already.
Fast and accurate search for small datasets without using an index. The maximum documents before exceeding several seconds is:
- 50k documents for 3K ref architecture
- 1M documents for 50K ref architecture

Does not support filters when using an index.
- Recall and accuracy with filters using an index
- Dead tuples
Without an index, on the largest gitlab ref architecture (50K) it takes 4 minutes to search through 20 million records and 6 seconds to search through 5 million records. Smaller ref architectures take significantly longer. Link
Write throughput is too slow for production use when using an index Link
Does not support hybrid search when an HNSW index is present

*durations are mean duration running on 5K ref architecture machines with a dataset of 10 million vectors.

Edited Jul 09, 2025 by 🤖 GitLab Bot 🤖