Evaluate PGVector latest on PG14 vs PG16

We need to check whether the latest PGVector extension can be run on PG14 or not. The latest PGVector requires PG16 to benefit from the latest performance and quality increases but we didn't test if it will still run on PG14.

Can the latest version of pgvector run on PG14, PG15 and PG16?

  • Docker: pgvector v0.7.4 installs by default for PG14-PG16 when running postgres in docker

    Example docker-compose definition
    services:
      postgres:
        image: pgvector/pgvector:pg14
        ports:
          - "5432:5432"
        volumes:
          - ./main:/var/lib/postgresql/data
        environment:
          POSTGRES_PASSWORD: your_password
    
  • Non-docker: getting pgvector v0.7.4 for postgres as a service

    • Easy on postgres 16
    • We repeatedly ran into issues with PG14 and PG15 🔴

Are there real differences between PG14-PG16 when running on the same version of pgvector?

After doing substantial testing, the conclusion is there isn't a real difference.

We did benchmarks for accuracy and execution time across:

  • pg14, pg15 and pg16 all running pgvector version 0.7.4
  • machines running on 3K, 5K, 10K, 25K, 50K GitLab reference architectures
  • datasets containing 50K, 1M, 5M, 10M, 20M embeddings

With HNSW index

Accuracy with HNSW index is constant across pg14-pg16

This plot shows accuracy across all datasets and reference architectures.

image

Execution time with HNSW index is mostly constant for pg14-pg16

This plot shows the execution time on a 50K reference architecture on a 20 million documents dataset.

image

Without HNSW index

Accuracy without HNSW index is constant for pg14-pg16

This plot shows accuracy across all datasets and reference architectures. PG15 reached the timeout of 5 minutes when searching through 20 million records, giving an accuracy of 0 which is why you see the drop at K=5000.

image

Execution time without HNSW index is constant for pg14-pg16

This plot shows the execution time on a 3K reference architecture across different datasets.

image

Edited by Madelein van Niekerk