Use quantization for Elasticsearch embeddings
Proposal
Use quantization on Elasticsearch for storing embeddings.
Available on 8.12.0:
When using the int8_hnsw index, each of the float vectors' dimensions are quantized to 1-byte integers. This can reduce the memory footprint by as much as 75% at the cost of some accuracy. However, the disk usage can increase by 25% due to the overhead of storing the quantized and raw vectors.
A significant optimisation was made in 8.14 (link).
I suggest upgrading to 8.14.3 to unlock the potential for vector search. If not, we can upgrade to 8.12.2 instead to just get the quantization.
We need to continue supporting earlier versions of Elasticsearch.
Steps
- Upgrade the staging indexing, production indexing and monitoring clusters to 8.14.3 via a CR. Remember to pause indexing for this upgrade.
- Move embeddings from issues index to workitems ... (#476537 - closed) and use quantization