"ZFS vs ext4 test": Document every step, methodology, results, draw graphs
This is a document. Please don't change this description.
ZFS vs. Ext4 tests on AWS i3.2xlarge instances (NVME drive)
Main links
- Permanent results link with graphs: https://docs.google.com/spreadsheets/d/1eDfxzxzIYoGfh6hi4_EDGIhVTVMaPtbjdwuZtpLzAsY/
- Provisioning/test scripts: https://github.com/Nastradamus/cookbooks/tree/master/test_zfs_on_i3
Goal
Understand can we use ZFS to speed up Nancy's series of experiments with snapshots?
Hardware/OS specs
- Virtualization: xen
- AWS EC2 spot instance with a local attached NVME drive - i3.2xlarge instances (NVME)
- RAM: 61GB
- CPU: 8 CPU cores Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
- DISK: "ethemerial" local NVME drive
- NUMA: off
- OS: Ubuntu 16.04.5 LTS
- Linux Kernel: 4.4.0-1075-aws
- RDBMS: PostgreSQL 11.1
Methodology
1) Rest machine's state to default and general tuning:
- General tuning of Ubuntu 16.04.5
# Set scaling_governor to performance mode
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable swap
echo 0 > /proc/sys/vm/swappiness
- Kill Postgres
- Clear NVME partition
- Format partition with Ext4 or ZFS and tune
We set ARC cache size to 20 GB in 61GB RAM node (all mem is in the NUMA 0).
ZFS tuning options:
compression=on
atime=off
recordsize=8k/16k
logbias=throughput/latency
Ext4 tuning options (mount options):
noatime
data=writeback
barrier=0
nobh
-
Re-setup Postgres
-
Tune Postgres config for this hardware:
listen_addresses = '*'
logging_collector = off
log_destination = 'csvlog'
max_connections = 800
fsync = on
ssl = off
autovacuum = off
shared_buffers = 11GB
work_mem = 64MB
effective_cache_size = 64GB
wal_level = minimal
full_page_writes = off
wal_log_hints = off
checkpoint_timeout = 120min
max_wal_size = 40GB
min_wal_size = 10GB
checkpoint_completion_target = 1.0
hot_standby = off
max_wal_senders = 0
# vacuum tuning (for speed up dataset creation)
maintenance_work_mem = 2GB
vacuum_cost_delay = 1ms
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 1700
- Re-init pgbench dataset
We create each time new ~120GB dataset with scaling factor 8000.
- Clear caches:
echo 3 > /proc/sys/vm/drop_caches
- Prewarm 3 minutes with pgbench:
pgbench -j8 -c100 -T120 -Mprepared -Upostgres test
2) Run pgbench for 10 minutes with reporting every 10 seconds
pgbench -j8 -c100 -T600 -r -P 10 -Mprepared -Upostgres test
We use 100 clients in 8 threads to get maximum throughput. "100 clients" is "the fastest" value for this machine type.
3) Save average latency and TPS results in a separate file (separately for ZFS and Ext4)
4) Run several times all above for Ext4 and ZFS
Scripting examples
https://github.com/Nastradamus/cookbooks/tree/master/test_zfs_on_i3
Example of run
./run ext4 my.aws.host.compute.amazonaws.com
./run zfs my.aws.host.compute.amazonaws.com
Results are saved to ./results/(ext4|zfs)_results_timed.csv.
Examining results
Make graphs with help of Google Sheets
Results
https://docs.google.com/spreadsheets/d/1eDfxzxzIYoGfh6hi4_EDGIhVTVMaPtbjdwuZtpLzAsY/
ZFS 8k with 2 minutes of pgbench prewarm (test 1):

ZFS 8k with 2 minutes of pgbench prewarm (test 2):

ZFS 16k with 2 minutes of pgbench prewarm:

ZFS 8k, logbias=throughput with 2 minutes prewarm:
Ext4 the final test (just to prove stability):
Performance comparison:
Observations and conclusions
-
Ext4 is an absolute winner in this race (TPS and latency): it is almost 2x faster for PostgreSQL OLTP workload than ZFS.
-
ZFS showed good sustainability results after the ARC prewarm.
-
logbias=throughputshowed perfect sustainability. -
recordsize=8kis the best recommendation for PostgreSQL.
Unfurtionatly we need prepared AWS AMI (image) with pre-compiled ZFS modules to start Nancy fast with ZFS support. apt-get installation process of ZFS modules consumes about 10 minutes (always compiling from scratch). Also, we need to support different AMI images across all interested AWS regions. Another bad news: docker-machine EC2 driver supports AMI ID changing, but AMI ID is different across regions. We need to code complicated logic to invent ZFS support in Nancy (may take time).





