Skip to content

GitLab Next

Why GitLab
Pricing
Contact Sales
Explore

Sign in
Get free trial

investigate broken log sources in our infrastructure (and combine multiple low rate log streams)

There appear to be no logs in either of the production clusters (log.gitlab.net, log.gprd.gitlab.net) for some of the indices.

Indices and infra that can be removed:

api
- indices for api are completely empty so they can be removed
- indices
- index patterns
- aliases
- index template
application
- can be deprecated
- these logs are now sent to pubsub-rails-inf-gprd and can be found by filtering with: json.tag.keyword: rails.application , e.g. https://log.gprd.gitlab.net/goto/1ff92ea8c9a437c6b80bc41e6de803ba
- there used to be only a text version: https://docs.gitlab.com/ee/administration/logs.html#applicationlog
- indices
- index patterns
- aliases
- index template
gitlab-shell
- not configured in ES7, any remaining infra can be removed
- indices
- index patterns
- aliases
- index template
haproxy (we no longer send haproxy logs to elastic, we only send them to StackDriver)
- indices
- index patterns
- aliases
- index template
nginx
- indices
- index patterns
- aliases
- index template
production
- was not configured in ES7 so it's no longer used
- any remaining infra (VM, pubsub topic, subscription) can be removed
- indices
- index patterns
- aliases
- index template
rc-rails
- indices
- index patterns
- aliases
- index template
rspec
- not configured in ES7, any remaining infra can be removed
- indices
- index patterns
- aliases
- index template
unicorn
- indices
- index patterns
- aliases
- index template
unstructured
- indices
- index patterns
- aliases
- index template

Requires further investigation:

consul
- the indices are being used, extremely low rate of logs
- some logs missing in ES (not all logs from consul machines are forwarded to ES)
redis
- confirmed (by looking at log files on machines and log msgs in Elastic) that logs are properly forwarded for redis-*, redis-sidekiq-* and redis-cache-*
registry
- confirmed it's still being used and it's operational

There are also stacktraces missing from a number of log streams, e.g. Sidekiq, Postgres

~~Consider sending haproxy/nginx logs to the new cluster now that we have much more capacity~~
determine if each of the items on the list was deprecated or if there's something broken, e.g. fluentd not parsing files
fix what's broken
get rid of any resources/config related to deprecated logs (fluentd config, pubsubbeat VMs, pubsub topics); a lot of this was done as part of: &180 (closed) and terraform cleanups that followed
~~[ ] consider running beats for multiple logs streams on a single VM if the log rate is very low~~ no longer relevant after: &180 (closed)

Edited Feb 22, 2021 by Michal Wasilewski

Assignee

Select assignees

Time tracking