SPIKE: vector store benchmarking
Benchmarking and results
The benchmark runs two queries:
- get 5 nearest vectors for a project that has the same number of records as
gitlab-org/gitlab- simulating a project semantic search forgitlab - get 5 nearest vectors for a group that has the same number of records as
gitlab-org- simulating a group semantic search forgitlab-org - get 5 nearest vectors without any filters
The same embeddings are used for the query vector across all benchmark runs.
In the benchmark we don't consider cold cache since warmed cache is what will be used in vast majority of cases, so we warm up before starting the benchmark.
The benchmark calculates a couple of performance-related metrics. It does not measure accuracy, recall, etc.:
- Min, max, mean, median duration
- Standard deviation and outliers (+-1 stddev from mean, not shown in table below)
- Operations per second, computed as
1 / mean
Full results are stored in this folder.
Interpretation
For a small gitlab instance with relatively few docs, Elasticsearch is magnitudes faster than postgres.
3K Reference Architecture results - 3M documents
5K Reference Architecture - 10M documents
Postgres index concerns
Write throughput
Apart from the big performance difference, we must acknowledge the problems with a big HNSW index in postgres and write throughput. Given that postgres could only process a few hundred write ops/s could cause updates to be backed up severely, especially for data that is updated frequently like merge requests or code.
Index build time
We could not create an HNSW postgres index for 5M documents on any of gitlab's existing reference architecture machines, even n2-standard-32 (128GB). Using a n2-standard-64 (256GB) machine, we were able to create the index in less than a day and also had to bump maintenance_work_mem to 96GB which will probably not be recommended for production use-cases.
Setup
We took a dataset from hugging face with 35 million records pulled from wikipedia with text embeddings generated with 768 dimensions: https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings.
We created large VMs with 32 CPU cores and 128GB memory to read this dataset, assigned permissions to each record to have a distribution of projects and groups. We then read this into a postgres database and an Elasticsearch index. These were then copied to VMs running GitLab reference architectures and we ran the same benchmarks against both. We started with the 3K ref architecture.
The same resources are allocated and the same data is used for both.
- Elasticsearch: version
8.15.1. Using an HNSW graph with a cosineknnquery containing filters for project/group. - Postgres: version
16, pgvector version0.7.4. Using an HNSW index with cosine distance.
Both use HNSW graphs to efficiently find similar vectors. We used the same graph parameters for both ES and pg:
-
m= 16 -
ef_construction= 100
Dataset
We ran into problems with getting 35M records into postgres with an HNSW index. Doing writes into a table with an HNSW is excruciatingly slow on a powerful machine (see specs above) it slows down to about 100 ops/s.
In order to do updates on the table, we dropped the index and that increased the throughput. We inserted 27M docs and then tried to build the index after the data was ingested and it ran for longer than 4 hours, maxing out 32 CPUs. We decided to kill it because there was no indication how long it would take to build the full index.
Instead, we decided to start at 3M records and see what postgres can handle in a reasonable timeframe.


