Test out Loki to see if we should use for our logging solution

Problem to solve

We currently support viewing pod logs within k8s, but we don't have any aggregated logging solution. We were considering an elasticsearch based solution, but ES is costly in compute as well as requiring lots of attention to keep it running.

We should consider alternate solutions which require less maintenance and compute costs.

Target audience

Further details

Proposal

Grafana has launched a new open source logging solution, Loki, pitched as "Prometheus but for logs". One of the nice features, is that it uses the Prometheus service discovery code base, so many of the labels will be the same as from Prometheus for easy correlation.

We should evaluate this solution to see if it makes more sense to base our solution on than ES. A few factors we should consider:

  • Ease of setup and on-going administration
  • Required compute, storage, and other costs to run the service
  • Scalability
  • Ability to be integrated with GitLab
  • Alignment with our other observability tools, like Prometheus

What does success look like, and how can we measure that?

A determination on whether we should base our aggregated logging solution on Loki, or something else like ES.

Links / references

Edited by silv