Switch from LZ4 to deflate compression for elasticsearch
I was re-investigating a couple of aspects regarding elasticsearch index size, and I remembered the old "remove the _source
field" trick: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html
This comes strongly recommended against, but the documentation pointed me at the elasticsearch index.codec
setting as I was reading all the reasons: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-codec
Testing this out on a GDK set up with the default seed data, and consisting of 294,682 documents, I observed the following:
-
index.codec = default
: 1,746,580,749 bytes -
index.codex = best_compression
: 1,714,736,505 bytes
This saves us 31,844,244 bytes - or 2% - on this case, with the trade-off being a reduction in raw performance. I haven't quantified the extent of that, but suspect that since our elastic nodes are unlikely to ever hit 100% CPU, it's not a huge issue.
Not amazing, but better than 0%
WDYT @smcgivern @mdelaossa ?