Incubation:APM ClickHouse evaluation
Summary
So far we've tested a basic ClickHouse schema against TimescaleDB using apm/tsbs.
ClickHouse performs very well, performing better than TimescaleDB in almost every category - see results for in depth graphs.
ClickHouse background
There are many databases to choose from in this domain. There are lots of time series specific databases - https://db-engines.com/en/ranking/time+series+dbms - but we ideally want something that can handle a variety of o11y data, not just time series specifically.
ClickHouse is gaining traction in a number of domains. It is horizontally scalable, with replication, provides an SQL syntax and can be used to perform large analytical queries.
We want to evaluate ClickHouse as a horizontally scalable datastore for o11y data (metrics, logs, traces).
There is some precedent for using ClickHouse in this area:
- https://slides.com/abraithwaite/segment-clickhouse
- https://eng.uber.com/logging/
- https://questdb.io/time-series-benchmark-suite/
- https://altinity.com/blog/2019/7/new-encodings-to-improve-clickhouse
- https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/
- https://tech.ebayinc.com/engineering/ou-online-analytical-processing/
Articles about benchmarks for ClickHouse
- ClickHouse vs Amazon Redshift for Time Series Data - http://brandonharris.io/redshift-clickhouse-time-series/
- ClickHouse vs Spark/MySQL - https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-columnar-database/
Time Series Benchmarking Suite (TSBS)
We're using tsbs to benchmark ClickHouse schemas against other solutions.
Currently supported databases for tsbs are:
Database | Multi-Model | Replication | Query Flexibility | Maturity |
---|---|---|---|---|
Akumuli | No | No | Low | Low |
Cassandra | No | Yes | Medium | High |
ClickHouse | Yes | Yes | High | Medium |
CrateDB | Yes | Yes | High | Medium |
InfluxDB | No | Yes (Enterprise only) | High | High |
MongoDB | Yes | Yes | High | High |
QuestDB | No | No | Medium | Low |
SiriDB | No | Yes | Low | Low |
TimescaleDB | Yes | Yes | High | High |
Timestream | No | Yes | Medium | Medium |
VictoriaMetrics | No | Yes | High | Medium |
In this iteration we're limiting ourselves to multi-modal databases that we might be able to use for all o11y data, rather than have to support multiple databases.
In this case we will benchmark various ClickHouse schemas against CrateDB, MongoDB and TimescaleDB.
Checklist:
-
Experimental setup scripts and bug fixes in https://gitlab.com/gitlab-org/incubation-engineering/apm/tsbs -
Initial TimescaleDB vs ClickHouse comparison with varying DevOps workloads and data sizes. -
Comparison with CrateDB and MongoDB too - see #12 (closed) -
Capture and present CPU, memory and disk storage usage for each database.
TSBS Outline
Key configuration items:
-
USE_CASE
- selection of particular time-series data generation and queries, we use "cpu-only" (i.e. only CPU data) and "devops" (i.e. large ops sample with various tables). -
SCALE
- sets number of hosts in data simulation. 100 hosts results in 259200000 individual metrics for "cpu-only" -
FORMATS
- database formats to use for generation and loading. -
NUM_WORKERS
- number of workers used for parallel loading/querying.
TSBS is used to script tests against target databases in the following flow:
- Start cadvisor/prometheus for monitoring DB containers.
- For each format:
- Generate data based on
SCALE
andUSE_CASE
. - Generate queries based on
SCALE
andUSE_CASE
. - Start target database.
- Load data into target DB (metrics files created).
- Execute queries against target DB (metrics files created).
- Stop DB and clean up.
- Acquire container metrics from Prometheus (cpu, memory, storage).
- Generate data based on
Example Data for "devops"
For "cpu-only" the "cpu" and "tags" schemas are the only ones in use.
ClickHouse schemas:
tags: hostname string,region string,datacenter string,rack string,os string,arch string,team string,service string,service_version string,service_environment string
cpu: usage_user,usage_system,usage_idle,usage_nice,usage_iowait,usage_irq,usage_softirq,usage_steal,usage_guest,usage_guest_nice
disk: total,free,used,used_percent,inodes_total,inodes_free,inodes_used,diskio,reads,writes,read_bytes,write_bytes,read_time,write_time,io_time
kernel: boot_time,interrupts,context_switches,processes_forked,disk_pages_in,disk_pages_out
mem: total,available,used,free,cached,buffered,used_percent,available_percent,buffered_percent
net: bytes_sent,bytes_recv,packets_sent,packets_recv,err_in,err_out,drop_in,drop_out
nginx: accepts,active,handled,reading,requests,waiting,writing
postgresl: numbackends,xact_commit,xact_rollback,blks_read,blks_hit,tup_returned,tup_fetched,tup_inserted,tup_updated,tup_deleted,conflicts,temp_files,temp_bytes,deadlocks,blk_read_time,blk_write_time
redis: uptime_in_seconds,total_connections_received,expired_keys,evicted_keys,keyspace_hits,keyspace_misses,instantaneousops_per_sec,instantaneous_input_kbps,instantaneous_output_kbps,connected_clients,used_memory,used_memory_rss,used_memory_peak,used_memory_lua,rdb_changes_since_last_save,sync_full,sync_partial_ok,sync_partial_err,pubsub_channels,pubsub_patterns,latest_fork_usec,connected_slaves,master_repl_offset,repl_backlog_active,repl_backlog_size,repl_backlog_histlen,mem_fragmentation_ratio,used_cpu_sys,used_cpu_user,used_cpu_sys_children,used_cpu_user_children
Example data snippet:
tags,hostname=host_0,region=eu-west-1,datacenter=eu-west-1c,rack=87,os=Ubuntu16.
04LTS,arch=x64,team=NYC,service=18,service_version=1,service_environment=product
ion
cpu,1451606400000000000,58,2,24,61,22,63,6,44,80,38
tags,hostname=host_1,region=ap-southeast-1,datacenter=ap-southeast-1b,rack=97,os
=Ubuntu15.10,arch=x86,team=LON,service=12,service_version=0,service_environment=
production
diskio,1451606400000000000,0,0,0,0,0,0,0
etc
etc
Example Queries for "devops"
Example queries and descriptions for those generated by the benchmark.
Query Name | Description | Table |
---|---|---|
cpu-max-all-1 | max of all CPU metrics, random host, random 8h segment by 1h | cpu |
cpu-max-all-8 | max of all CPU metrics, random 8 hosts, random 8h segment by 1h | cpu |
double-groupby-1 | mean of 1 metric, all hosts, random 12h segment by 1h | cpu |
double-groupby-5 | mean of 5 metrics, all hosts, random 12h segment by 1h | cpu |
double-groupby-all | mean of 10 metrics, all hosts, random 12h by 1h | cpu |
groupby-orderby-limit | max cpu over last 5 min-intervals (random end) | cpu |
high-cpu-1 | CPU over threshold, 1 host | cpu |
high-cpu-all | CPU over threshold, all hosts | cpu |
lastpoint | Last row per host | cpu |
single-groupby-1-1-1 | 1 cpu metric, random host, random 1h group by 1m | cpu |
single-groupby-1-1-12 | 1 cpu metric, random host, random 12h group by 1m | cpu |
single-groupby-1-8-1 | 1 cpu metric, random 8 hosts, random 1h group by 1m | cpu |
Examples of each query in for ClickHouse:
cpu-max-all-1 example
SELECT
toStartOfHour(created_at) AS hour,
max(usage_user) AS max_usage_user, max(usage_system) AS max_usage_system, max(usage_idle) AS max_usage_idle, max(usage_nice) AS max_usage_nice, max(usage_iowait) AS max_usage_iowait, max(usage_irq) AS max_usage_irq, max(usage_softirq) AS max_usage_softirq, max(usage_steal) AS max_usage_steal, max(usage_guest) AS max_usage_guest, max(usage_guest_nice) AS max_usage_guest_nice
FROM cpu
WHERE tags_id IN (SELECT id FROM tags WHERE hostname IN ('host_49')) AND (created_at >= '2016-01-01 11:48:31') AND (created_at < '2016-01-01 19:48:31')
GROUP BY hour
ORDER BY hour
cpu-max-all-8 example
SELECT
toStartOfHour(created_at) AS hour,
max(usage_user) AS max_usage_user, max(usage_system) AS max_usage_system, max(usage_idle) AS max_usage_idle, max(usage_nice) AS max_usage_nice, max(usage_iowait) AS max_usage_iowait, max(usage_irq) AS max_usage_irq, max(usage_softirq) AS max_usage_softirq, max(usage_steal) AS max_usage_steal, max(usage_guest) AS max_usage_guest, max(usage_guest_nice) AS max_usage_guest_nice
FROM cpu
WHERE tags_id IN (SELECT id FROM tags WHERE hostname IN ('host_49','host_3','host_35','host_39','host_75','host_15','host_21','host_11')) AND (created_at >= '2016-01-0111:48:31') AND (created_at < '2016-01-01 19:48:31')
GROUP BY hour
ORDER BY hour
double-groupby-1 example
SELECT
hour,
hostname,
mean_usage_user
FROM
(
SELECT
toStartOfHour(created_at) AS hour,
tags_id AS id,
avg(usage_user) AS mean_usage_user
FROM cpu
WHERE (created_at >= '2016-01-02 11:22:40') AND (created_at < '2016-01-02 23:22:40')
GROUP BY
hour,
id
) AS cpu_avg
ANY INNER JOIN tags USING (id)
ORDER BY
hour ASC,
hostname
double-groupby-5 example
SELECT
hour,
hostname,
mean_usage_user, mean_usage_system, mean_usage_idle, mean_usage_nice, mean_usage_iowait
FROM
(
SELECT
toStartOfHour(created_at) AS hour,
tags_id AS id,
avg(usage_user) AS mean_usage_user, avg(usage_system) AS mean_usage_system, avg(usage_idle) AS mean_usage_idle, avg(usage_nice) AS mean_usage_nice, avg(usage_iowait) AS mean_usage_iowait
FROM cpu
WHERE (created_at >= '2016-01-02 11:22:40') AND (created_at < '2016-01-02 23:22:40')
GROUP BY
hour,
id
) AS cpu_avg
ANY INNER JOIN tags USING (id)
ORDER BY
hour ASC,
hostname
double-groupby-all example
SELECT
hour,
hostname,
mean_usage_user, mean_usage_system, mean_usage_idle, mean_usage_nice, mean_usage_iowait, mean_usage_irq, mean_usage_softirq, mean_usage_steal, mean_usage_guest, mean_usage_guest_nice
FROM
(
SELECT
toStartOfHour(created_at) AS hour,
tags_id AS id,
avg(usage_user) AS mean_usage_user, avg(usage_system) AS mean_usage_system, avg(usage_idle) AS mean_usage_idle, avg(usage_nice) AS mean_usage_nice, avg(usage_iowait) AS mean_usage_iowait, avg(usage_irq) AS mean_usage_irq, avg(usage_softirq) AS mean_usage_softirq, avg(usage_steal) AS mean_usage_steal, avg(usage_guest) AS mean_usage_guest, avg(usage_guest_nice) AS mean_usage_guest_nice
FROM cpu
WHERE (created_at >= '2016-01-02 11:22:40') AND (created_at < '2016-01-02 23:22:40')
GROUP BY
hour,
id
) AS cpu_avg
ANY INNER JOIN tags USING (id)
ORDER BY
hour ASC,
hostname
groupby-orderby-limit example
SELECT
toStartOfMinute(created_at) AS minute,
max(usage_user)
FROM cpu
WHERE created_at < '2016-01-01 22:38:24'
GROUP BY minute
ORDER BY minute DESC
LIMIT 5
high-cpu-1 example
SELECT *
FROM cpu
PREWHERE (usage_user > 90.0) AND (created_at >= '2016-01-01 21:18:55') AND (created_at < '2016-01-02 09:18:55') AND (tags_id IN (SELECT id FROM tags WHERE hostname IN ('host_3')))
high-cpu-all example
SELECT *
FROM cpu
PREWHERE (usage_user > 90.0) AND (created_at >= '2016-01-02 23:35:31') AND (created_at < '2016-01-03 11:35:31')
lastpoint example
SELECT *
FROM
(
SELECT *
FROM cpu
WHERE (tags_id, created_at) IN
(
SELECT
tags_id,
max(created_at)
FROM cpu
GROUP BY tags_id
)
) AS c
ANY INNER JOIN tags AS t ON c.tags_id = t.id
ORDER BY
t.hostname ASC,
c.time DESC
single-groupby-1-1-1
SELECT
toStartOfMinute(created_at) AS minute,
max(usage_user) AS max_usage_user
FROM cpu
WHERE tags_id IN (SELECT id FROM tags WHERE hostname IN ('host_49')) AND (created_at >= '2016-01-03 12:26:46') AND (created_at < '2016-01-03 13:26:46')
GROUP BY minute
ORDER BY minute ASC
Results
Test machine details:
- Google Cloud VM - n2-standard-16 (16 vCPUs, 64 GB memory)
- Intel(R) Xeon(R) CPU, 2800 MHz
-
200GB500GB (200GB wasn't enough space!) persistent SSD
Output of uname -a
:
Linux jshaw-clickhouse-benchmarking 5.11.0-1017-gcp #19~20.04.1-Ubuntu SMP Thu Aug 12 05:25:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Methodology: Use Google Cloud VM with Docker to best replicate a GitLab production environment while keeping it simple and repeatable. Benchmarking scripts clean the environment between each run (stop containers, delete volumes, etc.). 4 benchmark workers are used as this is similar to other benchmarks we've seen and it will minimize contention on the CPU resource.
TimescaleDB vs ClickHouse
Metric rate when loading databases with metrics:
ClickHouse significantly outperforms TimescaleDB in metric storage rate (therefore time taken is much shorter too).
Loaded data final volume sizes:
ClickHouse has a much smaller data volume footprint in all cases by almost a factor of 10.
Query p95 latencies (1000 queries per use-case/query type):
Generally ClickHouse performs better in the majority of query types. For group-by-orderby-limit
ClickHouse performs consistently worse by a significant margin, this could likely be improved with better indexing.
There are very few ClickHouse queries that have >1s latency at q95. TimescaleDB has multiple >1s latencies, including a few in the range of 15-25s.
CPU & Memory usage per test/database:
CPU utilization is fairly consistent between the databases. ClickHouse consistently outperforms TimescaleDB in terms of total memory usage over the tests. With the larger datasets TimescaleDB creeps upwards of 60GiB memory utilization, which may indicate that the machine size was impacting on performance.
Limitations
- Workers and databases are on the same machine, which is not realistic. However this removes the complexity of network performance from the benchmark.
- 4 workers are used for all benchmarks. It would be interesting to see how the databases perform with more or less workers.
- Database default settings are used. Better performance may be achievable by tuning various parameters.
- Resources were overutilized during the larger runs. This definitely impacted on individual DB performance.
- Queries only target
cpu
, this is probably for simplicity but isn't realistic.