Incubation:APM Clickhouse benchmarks vs CrateDB vs MongoDB
Following on from #4 (closed). Using apm/tsbs to benchmark a timeseries load/query workload with the aim of comparing multi-model and scalable database solutions.
We've completed a comparison of ClickHouse against TimescaleDB in #4 (closed) - this is the main competitive choice for us. For due diligence, we look to compare CrateDB and MongoDB as alternative solutions for the APM domain.
Results will be added to this issue.
CrateDB issues - unfortunately I couldn't get the CrateDB implementation in tsbs working in good time. There are parts of the query interfaces not implemented and various scripts missing. I'm not going to prioritize benchmarking and fixing this.
MongoDB vs ClickHouse
Fixes for tsbs MongoDB are implemented in tsbs!2 (merged)
See #4 (closed) for details of methodology, machine type, setup etc.
Based on previous results, we aren't running the entire devops use-case as the cpu-only case takes less time and appears to be a reliable subset based on "scale" (number of hosts generating metrics).
Results cpu-only case
Not all query types were completed for this run, due to the MongoDB queries being very slow. We will compare based on the results we have given the large difference in performance.
Metric rate when loading databases:
ClickHouse outperforms MongoDB here by a good amount, and even performs better overall when the number of metrics is larger.
Loaded data final volume sizes:
Again, ClickHouse disk space usage is far better.
Query p95 latencies (1000 queries per use-case/query-type):
- cpu-only_1000 groupby-orderby-limit failed with timeouts for MongoDB
Overall ClickHouse performs comparably or better than MongoDB.
CPU & Memory usage per test/database:
ClickHouse uses much more CPU, but is performing better and utilizing the server resources better as a result. As expected ClickHouse uses far less memory.
Limitations
See also #4 (closed)
The MongoDB Golang driver used in the project is no longer maintained. There may be database compatibility issues affecting performance.
While running MongoDB does not successfully utilize all VM cores. This may be a benchmarking connection pooling issue, or due to the types of query being performed (lots of aggregation pipelines).