Improve investigative techniques using correlation ID, tracing, and APM
When an upstream error is received, it'd be helpful during debugging if the response code were logged with the Rails 5XX response. In lieu of this, we can use the correlation ID–we need to make it easier to visually coalesce responses based on the originating request.
In the incident review for production#1919 (closed), SREs were slow to identify the source of the upstream error. To identify the cause of an incident, we need to build out faster workflows for correlating error codes throughout the stack.
Definition of Done
-
Improve runbook documentation on how to most effectively link logs using correlation IDs -
Open an Epic for distributed tracing: &210 (closed) -
Open an Epic for application monitoring
Edited by AnthonySandoval