Switch from LZ4 to deflate compression for elasticsearch

I was re-investigating a couple of aspects regarding elasticsearch index size, and I remembered the old "remove the _source field" trick: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

This comes strongly recommended against, but the documentation pointed me at the elasticsearch index.codec setting as I was reading all the reasons: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#index-codec

Testing this out on a GDK set up with the default seed data, and consisting of 294,682 documents, I observed the following:

  • index.codec = default: 1,746,580,749 bytes
  • index.codex = best_compression: 1,714,736,505 bytes

This saves us 31,844,244 bytes - or 2% - on this case, with the trade-off being a reduction in raw performance. I haven't quantified the extent of that, but suspect that since our elastic nodes are unlikely to ever hit 100% CPU, it's not a huge issue.

Not amazing, but better than 0%

WDYT @smcgivern @mdelaossa ?