The problem today is that most of the Sidekiq and Rails logs are not stored in a format that makes it easy to extract the requisite data out of it. For example, Sidekiq message might produce:
@stanhu another problem I have, and I'm not sure if this would be covered by the above links, is that backtraces are thrown out as a bunch of individual lines - but those lines lose almost all of the identifying features (because logger.info("multiple\nlines\nhere") only adds the prefix to the first line?), so you basically need to limit your search in Kibana to cover a 1s time period to catch the backtrace. Or maybe we should just say those will all be in Sentry?
@pcarranza I think the way to start is to see if we can get our Sidekiq logs to generate in JSON format in a separate directory or filename from the current logger.
I hear what you're saying, but it also sounds like NIH to some degree. There is fluentd that solves this problem beautifully. However, I don't view this as a blocker in terms of no we're not passing JSON out today, but we ARE gathering all the logs up centrally and parsing them into a consumable format.
I think we're actually on the same page John, and I don't think it's NIH - I don't want us to invent any logging technology! :)
I think we're all in agreement that we should be logging to disk and shipping logs to an Elasticsearch cluster from disk. This leaves two decisions:
Who writes the logs to disk?
I think that the process manager / init system is responsible for persisting stdout to disk. Upstart/systemd/foreman all do this.
The alternative is the application manages it's own log files, which I'm not a fan of (due to having to deal with SIGHUPs, log rotation configuration in app config, etc etc etc)
How do we ship logs to the logging system? The main choices I can think of here are Fluentd, Filebeat and Logstash. I'veusedFluentd extensively and Logstash a fair bit too.
Logstash is a memory hog and needs a JVM on your workers, I think we all agree we don't want that
Fluentd is very configurable and easy to extend. It's a Ruby project and at Gitter had quite a few issues setting it up due to Ruby versioning issues (but we were not a Ruby shop). Once it's running it's generally pretty good.
Filebeat is Go and compiles down to a single binary so is easy to deploy, but I'll defer to @stanhu's opinion on it as he's used it in anger and I haven't :)
All processes under omnibus-gitlab currently manage the log files via runit (e.g. /var/log/gitlab/gitaly/current) and logrotate. Outputting to stdout and stderr are already handled by that right now. I don't think we need to reinvent the wheel there.
I like Filebeat: it's a really lightweight agent on each of the machine, but it's only the log shipper; you still need Logstash or some aggregator. You'll also want to put some queue (e.g. Redis) in front of it to provide some resiliency. @amulvany has experience with this and can probably chime in here.
Logstash is indeed a memory hog, but it seems to be more robust than Fluentd, which is written in Ruby and doesn't have a solid multi-core/process story to support a large inflow of logs. My team had to constantly reset Fluentd whenever the CPU was pegged at 100%, got in a bad state of disconnections, or had internal buffers grow unbounded.
I think we already have a number of Logstash Grok patterns, which are already giving us structured querying for HAProxy logs, so I would vote to keep trying to make Logstash work as the aggregator.
As I read @stanhu's issue and the comments above, the main point was that we should be able to generate machine parsable logs from GitLab.
I personally love the idea of being able to structure the solution so that end users can either enable JSON logging output or disable JSON and get the current logging format.
On a different topic I don't think that we need to solve the "how do we ship the logs" question. We have historically not done this in any release and we leave it up to the user for however they wish to take log files from their machines and aggregate them. For us at GitLab we use rsyslog to catch all the output that is being written and stream it to a central logging server, however that's not a solution that we ship with GitLab, just something we use on GitLab.com. While I like Fluentd and am a champion of that product (I think they've matured over the last few years), ultimately I think that shipping and aggregation should be left to the hands of the engineers that are tasked with it and we should stick to enabling them to have either machine parsable or human readable data.
Redis is able to handle large bursts of data while the Logstash server computes, also it prevents data loss if ES goes down. Personally I would have to agree with @northrup and leave this to the engineers as I think it would be unlikely to find a one-size-fits-all solution.
Regarding JSON, I think you could write a stable grok filter for the above sidekiq event (albeit I'm not sure how much it varies) though JSON would offer more versatility as the data could then be indexed by anything. This is really handy for types of data that might need to be restored (i.e. audit trail), though I can't see it being of much benefit for application logs if the grok filter works.
It's also worth pointing out that if you're logging in JSON format, Filebeat is also capable of delivering logs directly into Elasticsearch without the need for Logstash. Obviously you lose some of the ability to transform/customise the log entries before persisting them, but it also means that you have one less moving part.
If your Elasticsearch is running in cluster, it's pretty resilient to downtime and you can do rolling upgrades on the ES infrastructure to avoid maintenance outages.
Using this configuration could help keep things simple, even if just in the beginning..
@pcarranza I think there should be a directive that all the components we develop should have an option to generate a structured JSON log message, and I agree this should be pushed onto the respective teams. I would suggest that we start with Unicorn/Rails (e.g. ~Platform) and see how this works out. I think it will be an incredibly useful thing.
@ernstvn I think this is most likely a sub-task of increasing observability and #29403 may need to get converted into a Meta with additional links. I'll discuss.
The Geo team is in search of a good structured logging system. I evaluated a number of gems out there, and it seems there are a lot out there with different focuses. For logging Rails views, I think lograge has the right approach for summarizing a single endpoint in a line. !12928 (merged) introduces a separate log file that can easily be ingested by Logstash. Note I looked at semantic_logger and logstasher. I think logstasher may work, but lograge seems better maintained for Rails 5.
Logging Sidekiq events is a different beast. There is some discussion here, but as far as I can tell there are two main gems:
However, as far as I can tell, lograge and logstasher aren't meant to log arbitrary messages (e.g. Rails.logger.info "foobar"); they are meant to handle the Rails request logs. I think we'll want to use something like LogstashLogger instead.
@briann Oh, right. It looks like Fluentd recommends using Lograge in JSON mode: http://www.fluentd.org/datasources/rails It sounds like we can standardize on just pure JSON, since the Logstash format just adds a few fields that aren't really necessary.
Although it always seems odd to me that Fluentd recommends using JSON mode while saying that if you are using Elasticsearch, use a plugin that generates Logstash-compatible data. Fluentd has a tendency to use a lot of CPU doing JSON parsing and unparsing, so I think we're better off just sticking with Logstash format.
This is an old issue but I would like to re-animate it as we are now looking at logging for the gcp migration as well as deprecating the existing logstash configuration in favor of fluentd across all environments.
We can write fluentd regex parsers but it would be much better to have a more standardized approach to logging. I would like to see at least sidekiq and the golang projects add support in the shortterm if it is doable.
@jarv for the items which currently don't have structured logging (and some which possibly never will), there's nothing blocking us from simply shipping the logs as text (as they are right now), right? For example, there's no urgent need for Redis logs to be parsed in a structured manner is there?
@jarv for the items which currently don't have structured logging (and some which possibly never will), there's nothing blocking us from simply shipping the logs as text (as they are right now), right? For example, there's no urgent need for Redis logs to be parsed in a structured manner is there?
I did some investigation here for Sidekiq JSON logging, and the existing gems don't quite do what we need. I think the best approach is to create our own Sidekiq middleware and logger based on the work done on the existing gems out there.