Improve structured logging on DWS
Problem to solve
Log is a crucial observability component for investigating production incidents. Also, it can be used to extract the characteristics of the server performance by visualizing the logged fields on Click House.
Currently, structured logs on DWS are not well-organized, thus it's hard to build a log-based dashboard. Besides, adding more metadata to the structured logs help us to investigate the daily issues.
Proposal
Here are the known issues:
- We need more logs at grpc lib layer (including C executable) if possible. Related to #1482 (closed).
- "logger" field should be the module name or path (e.g.
duo_workflow_service.server
) to indicate the source of the log. (Sometimes it's hard to figure out where a log line is coming from) - Make sure that a meaningful message is attached to the exception message. Currently, it's hard to grasp the error reason ref.
- Consolidate the additional fields logging strategy. Currently, we log in the following way:
- Putting at the same level of
event
. - Putting it under
additional_details
. - Or maybe more.
- Putting at the same level of
- Make sure that all exceptions are logged by
log_exception
module. - Make sure that exceptions are not silenced unintentionally. It should bubble up by default.
Further details
Links / references
Edited by Shinya Maeda