Test out Loki to see if we should use for our logging solution
Problem to solve
We currently support viewing pod logs within k8s, but we don't have any aggregated logging solution. We were considering an elasticsearch based solution, but ES is costly in compute as well as requiring lots of attention to keep it running.
We should consider alternate solutions which require less maintenance and compute costs.
Target audience
-
Sasha, Software Developer, https://design.gitlab.com/research/personas#persona-sasha
-
Devon, DevOps Engineer, https://design.gitlab.com/research/personas#persona-devon
-
Sidney, Systems Administrator, https://design.gitlab.com/research/personas#persona-sidney
-
Sam, Security Analyst, https://design.gitlab.com/research/personas#persona-sam
Further details
Proposal
Grafana has launched a new open source logging solution, Loki, pitched as "Prometheus but for logs". One of the nice features, is that it uses the Prometheus service discovery code base, so many of the labels will be the same as from Prometheus for easy correlation.
We should evaluate this solution to see if it makes more sense to base our solution on than ES. A few factors we should consider:
- Ease of setup and on-going administration
- Required compute, storage, and other costs to run the service
- Scalability
- Ability to be integrated with GitLab
- Alignment with our other observability tools, like Prometheus
What does success look like, and how can we measure that?
A determination on whether we should base our aggregated logging solution on Loki, or something else like ES.