UX: It is very hard to find the cause of sidekiq job failures
Description
It is very hard for the admin to identify the cause of sidekiq job failures. We've had 10 failures in total in our instance, and it was only pure luck that I actually managed to find the log and find the reason.
Proposal
-
On /admin/sidekiq/, the "10 failed" portion on the bar (next to processed / busy / enqueued etc counts) should really link somewhere which lists the failed jobs and the logging associated with them, or at a minimum the time of each failure, or at least the time period that they occurred in.
-
The graph is pretty much useless for determining when failures occurred, as it plots 'processed' and 'failed' on the same axis at the same scale. When 'processed' peaks at 4,000 (as on our instance) and 'failed' peaks (in the last week) as '2', this means 'failed' is just a flat line on the axis, you can't actually identify when the failure occurred from the graph (except my mousing over each point one at a time to see if it has a non-zero value).
-
On the sidekiq.log page of /admin/logs, it would be nice to be able to for 'WARN' level messages, or even for them to (also) appear in a separate sidekiq-warnings.log page or similar.