Log aggregation

Edit: We already selected Elastic for log aggregation and we’re working on dogfooding it in &1859

Right now we tail logs. The next step will be aggregating logs. But it isn't clear what technology we will use:

Elastic Search is the most used but it uses a lot of CPU, and is hard to scale and maintain.
Loki seems promising since it allows you to use object storage but you need a second technology if you need a complex query (select based a property that isn't a Prometheus label)

We've been able to add Prometheus to GitLab by default because it didn't use a lot of CPU and was simple (Go binary, no Java stack needed like Elastic). Adding it by default increases adoption a lot so it would be nice to use Loki + another technology for complex queries on object storage.

I asked on twitter with https://twitter.com/sytses/status/1212125911069233152 what we can use to do a complex query, alternatives mentioned in that thread:

Amazon Athena which is more SQL oriented
Presto by Facebook is the only open source one but more SQL based
CHAOSSEARCH
Apex Logs https://apex.sh/logs/ which might be open sourced https://twitter.com/tjholowaychuk/status/1212319057204203520
SpectX https://www.spectx.com/ which is closed source
Humio which is closed source and doesn't have much detail about the technology
Late addition: Apache Drill, open source, seems similar to Presto but is better with unstructured data using Jason instead of SQL https://qr.ae/TSz9oU

Both Drill and Presto use java and commonly use multiple nodes, so they seem of equal weight to Elastic.

In security incident situations You need quick results, index on write makes sense and puts the burden on the computer instead of having the human wait.

For GitLab and AutoDevOps the initial use case is more important to nail than a scaled application. At low log volume Elastic is better than at high volume when you need to get selective.

Logging more makes sense when the application is getting started with many unknowns and few users.

cc @kencjohnston

Edited Jan 04, 2020 by Sid Sijbrandij