Consolidate log shippers in kubernetes
Problem
We currently run two log shipper daemonsets in Kubernetes:
- fluent bit in the kube-system namespace. This is part of GCP's default GKE configuration, and sends all container logs to stackdriver.
- fluentd in the logging namespace.
- At the time of writing, this is configured by us in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/tree/master/releases/fluentd-elasticsearch. It sends logs to Elasticsearch.
- After https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11655, which blocks this issue, it will likely be configured elsewhere, and send logs to pubsub, which acts as a buffer.
- Only some containers (the ones running our apps) have their logs read and shipped.
There are a few suboptimal consequences of this:
- Since the GKE fluent bit configuration reads all container log files, we're reading the highest-throughput log files (our own apps) twice. As we migrate more workloads to kubernetes, this will become more painful.
- Our own apps' logs are stored in stackdriver (as well as ES), which incurs some costs
- I have not quantified this cost yet
- Confusion for SREs: some log shipping config lives in our own Kubernetes config, and some is not version controlled (by us) and is part of the GKE platform).
Desired outcome
All log shipping/pipeline configuration is version controlled in one place, presumably our fluentd configuration that we currently store in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/tree/master/releases/fluentd-elasticsearch (unless it has been moved to tanka).
Acceptance criteria
-
All log shipping/pipeline configuration is version controlled in one place. -
Log files are only read once in Kubernetes, and any required fanning out / routing logic is part of the log pipeline.