Consolidate log shippers in kubernetes
<!--
I see you're filing an issue for the observability SRE team. Please write a clear description of the problem in the issue description, and acceptance criteria. Give the issue a terse but descriptive title.
Don't forget to:
- Put the issue in an epic, if one fits
- assign a weight based on perceived complexity and how much of the problem space is unknown: 1, 2 or 4
See https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/observability/ for a description of our workflow.
Remember to keep your epics in order of priority / necessity, to indicate to the team lead and engineering manager that you would like it to be inserted into the backlog.
-->
## Problem
We currently run two log shipper daemonsets in Kubernetes:
- fluent bit in the kube-system namespace. This is part of GCP's default GKE configuration, and sends all container logs to stackdriver.
- fluentd in the logging namespace.
- At the time of writing, this is configured by us in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/tree/master/releases/fluentd-elasticsearch. It sends logs to Elasticsearch.
- After https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11655, which blocks this issue, it will likely be configured elsewhere, and send logs to pubsub, which acts as a buffer.
- Only some containers (the ones running our apps) have their logs read and shipped.
There are a few suboptimal consequences of this:
- Since the GKE fluent bit configuration reads all container log files, we're reading the highest-throughput log files (our own apps) twice. As we migrate more workloads to kubernetes, this will become more painful.
- Our own apps' logs are stored in stackdriver (as well as ES), which incurs some costs
- I have not quantified this cost yet
- Confusion for SREs: some log shipping config lives in our own Kubernetes config, and some is not version controlled (by us) and is part of the GKE platform).
## Desired outcome
All log shipping/pipeline configuration is version controlled in one place, presumably our fluentd configuration that we currently store in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/tree/master/releases/fluentd-elasticsearch (unless it has been moved to tanka).
## Acceptance criteria
- [ ] All log shipping/pipeline configuration is version controlled in one place.
- [ ] Log files are only read once in Kubernetes, and any required fanning out / routing logic is part of the log pipeline.
issue