investigate broken log sources in our infrastructure (and combine multiple low rate log streams)
There appear to be no logs in either of the production clusters (log.gitlab.net, log.gprd.gitlab.net) for some of the indices.
Indices and infra that can be removed:
-
api - indices for api are completely empty so they can be removed
-
indices -
index patterns -
aliases -
index template
-
application - can be deprecated
- these logs are now sent to
pubsub-rails-inf-gprd
and can be found by filtering with:json.tag.keyword: rails.application
, e.g. https://log.gprd.gitlab.net/goto/1ff92ea8c9a437c6b80bc41e6de803ba - there used to be only a text version: https://docs.gitlab.com/ee/administration/logs.html#applicationlog
-
indices -
index patterns -
aliases -
index template
-
gitlab-shell - not configured in ES7, any remaining infra can be removed
-
indices -
index patterns -
aliases -
index template
-
haproxy (we no longer send haproxy logs to elastic, we only send them to StackDriver) -
indices -
index patterns -
aliases -
index template
-
-
nginx -
indices -
index patterns -
aliases -
index template
-
-
production - was not configured in ES7 so it's no longer used
- any remaining infra (VM, pubsub topic, subscription) can be removed
-
indices -
index patterns -
aliases -
index template
-
rc-rails -
indices -
index patterns -
aliases -
index template
-
-
rspec - not configured in ES7, any remaining infra can be removed
-
indices -
index patterns -
aliases -
index template
-
unicorn -
indices -
index patterns -
aliases -
index template
-
-
unstructured -
indices -
index patterns -
aliases -
index template
-
Requires further investigation:
-
consul - the indices are being used, extremely low rate of logs
- some logs missing in ES (not all logs from consul machines are forwarded to ES)
-
redis - confirmed (by looking at log files on machines and log msgs in Elastic) that logs are properly forwarded for
redis-*
,redis-sidekiq-*
andredis-cache-*
- confirmed (by looking at log files on machines and log msgs in Elastic) that logs are properly forwarded for
-
registry - confirmed it's still being used and it's operational
There are also stacktraces missing from a number of log streams, e.g. Sidekiq, Postgres
-
Consider sending haproxy/nginx logs to the new cluster now that we have much more capacity -
determine if each of the items on the list was deprecated or if there's something broken, e.g. fluentd not parsing files -
fix what's broken -
get rid of any resources/config related to deprecated logs (fluentd config, pubsubbeat VMs, pubsub topics); a lot of this was done as part of: &180 (closed) and terraform cleanups that followed -
[ ] consider running beats for multiple logs streams on a single VM if the log rate is very lowno longer relevant after: &180 (closed)
Edited by Michal Wasilewski