This merge request has reached the maximum limit of 1000 versions and cannot be updated further. Close this merge request and create a new one instead.

"ZFS vs ext4 test": Document every step, methodology, results, draw graphs

This is a document. Please don't change this description.

ZFS vs. Ext4 tests on AWS i3.2xlarge instances (NVME drive)

Main links

Permanent results link with graphs: https://docs.google.com/spreadsheets/d/1eDfxzxzIYoGfh6hi4_EDGIhVTVMaPtbjdwuZtpLzAsY/
Provisioning/test scripts: https://github.com/Nastradamus/cookbooks/tree/master/test_zfs_on_i3

Goal

Understand can we use ZFS to speed up Nancy's series of experiments with snapshots?

Hardware/OS specs

Virtualization: xen
AWS EC2 spot instance with a local attached NVME drive - i3.2xlarge instances (NVME)
RAM: 61GB
CPU: 8 CPU cores Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
DISK: "ethemerial" local NVME drive
NUMA: off
OS: Ubuntu 16.04.5 LTS
Linux Kernel: 4.4.0-1075-aws
RDBMS: PostgreSQL 11.1

Methodology

1) Rest machine's state to default and general tuning:

General tuning of Ubuntu 16.04.5

  # Set scaling_governor to performance mode
  echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  # Disable swap
  echo 0 > /proc/sys/vm/swappiness

Kill Postgres
Clear NVME partition
Format partition with Ext4 or ZFS and tune

We set ARC cache size to 20 GB in 61GB RAM node (all mem is in the NUMA 0).

ZFS tuning options:

compression=on
atime=off
recordsize=8k/16k
logbias=throughput/latency

Ext4 tuning options (mount options):

noatime
data=writeback
barrier=0
nobh

Re-setup Postgres
Tune Postgres config for this hardware:

listen_addresses = '*'
logging_collector = off
log_destination = 'csvlog'
max_connections = 800
fsync = on
ssl = off
autovacuum = off
shared_buffers = 11GB
work_mem = 64MB
effective_cache_size = 64GB
wal_level = minimal
full_page_writes = off
wal_log_hints = off
checkpoint_timeout = 120min
max_wal_size = 40GB
min_wal_size = 10GB
checkpoint_completion_target = 1.0
hot_standby = off
max_wal_senders = 0
# vacuum tuning (for speed up dataset creation)
maintenance_work_mem = 2GB
vacuum_cost_delay = 1ms
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 1700

Re-init pgbench dataset

We create each time new ~120GB dataset with scaling factor 8000.

Clear caches:

echo 3 > /proc/sys/vm/drop_caches

Prewarm 3 minutes with pgbench:

pgbench -j8 -c100 -T120 -Mprepared -Upostgres test

2) Run pgbench for 10 minutes with reporting every 10 seconds

pgbench -j8 -c100 -T600 -r -P 10 -Mprepared -Upostgres test

We use 100 clients in 8 threads to get maximum throughput. "100 clients" is "the fastest" value for this machine type.

3) Save average latency and TPS results in a separate file (separately for ZFS and Ext4)

4) Run several times all above for Ext4 and ZFS

Scripting examples

https://github.com/Nastradamus/cookbooks/tree/master/test_zfs_on_i3

Example of run

./run ext4 my.aws.host.compute.amazonaws.com
./run zfs my.aws.host.compute.amazonaws.com

Results are saved to ./results/(ext4|zfs)_results_timed.csv.

Examining results

Make graphs with help of Google Sheets

Results

https://docs.google.com/spreadsheets/d/1eDfxzxzIYoGfh6hi4_EDGIhVTVMaPtbjdwuZtpLzAsY/

Ext4:

ZFS 8k without ARC prewarm:

ZFS 8k with 2 minutes of pgbench prewarm (test 1):

ZFS 8k with 2 minutes of pgbench prewarm (test 2):

ZFS 16k with 2 minutes of pgbench prewarm:

ZFS 8k, logbias=throughput with 2 minutes prewarm:

Ext4 the final test (just to prove stability):

Performance comparison:

Observations and conclusions

Ext4 is an absolute winner in this race (TPS and latency): it is almost 2x faster for PostgreSQL OLTP workload than ZFS.
ZFS showed good sustainability results after the ARC prewarm.
logbias=throughput showed perfect sustainability.
recordsize=8k is the best recommendation for PostgreSQL.

Unfurtionatly we need prepared AWS AMI (image) with pre-compiled ZFS modules to start Nancy fast with ZFS support. apt-get installation process of ZFS modules consumes about 10 minutes (always compiling from scratch). Also, we need to support different AMI images across all interested AWS regions. Another bad news: docker-machine EC2 driver supports AMI ID changing, but AMI ID is different across regions. We need to code complicated logic to invent ZFS support in Nancy (may take time).

Edited Feb 15, 2019 by Victor Yagofarov