switch fluentd in kubernetes to send logs through PubSub
For more background see:
- logging strategy issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10095
- logging strategy gdoc: https://docs.google.com/document/d/1EK3QUuC0JrN5ndXdz1McEwKirD_jfA7bk0siTgHXwI4/edit#heading=h.ymuyhym25zqn
- logging questionnaire: https://docs.google.com/document/d/15N3g86CtEOYDbiQVZGU0aRhf3peSxba9oClk_Pgv_yM/edit
When migrating workloads to kubernetes, we did not configure fluentd to send logs to PubSub, but instead to send it directly to Elasticsearch. See documents linked above for why we want to use a message queue long-term.
Definition of Done / Acceptance criteria / Desired outcome
Logs - Logs from Gitlab managed Fluentd DaemonSet running in kubernetes clusters
-
Logs are forwarded to GCP PubSub -
Logs are no longer sent directly to Elasticsearch -
fluentd buffers are not blocked on output failures -
fluentd timeout is configurable through version controlled config -
fluentd is not impacted by Elasticsearch performance issues -
Messages sent to Elasticsearch are uploaded by pubsubbeat processes only (there's consistency in all our log streams) -
Diagram in runbooks is up to date: https://gitlab.com/gitlab-com/runbooks/-/tree/master/docs/logging#logging-infrastructure-overview -
EOC is paged for fluentd error rate -
EOC is paged for pubsubbeat error rate
Edited by Craig Furman