Skip to content

Improve structured logging on DWS

Problem to solve

Log is a crucial observability component for investigating production incidents. Also, it can be used to extract the characteristics of the server performance by visualizing the logged fields on Click House.

Currently, structured logs on DWS are not well-organized, thus it's hard to build a log-based dashboard. Besides, adding more metadata to the structured logs help us to investigate the daily issues.

Proposal

Here are the known issues:

  • We need more logs at grpc lib layer (including C executable) if possible. Related to #1482 (closed).
  • "logger" field should be the module name or path (e.g. duo_workflow_service.server) to indicate the source of the log. (Sometimes it's hard to figure out where a log line is coming from)
  • Make sure that a meaningful message is attached to the exception message. Currently, it's hard to grasp the error reason ref.
  • Consolidate the additional fields logging strategy. Currently, we log in the following way:
    • Putting at the same level of event.
    • Putting it under additional_details.
    • Or maybe more.
  • Make sure that all exceptions are logged by log_exception module.
  • Make sure that exceptions are not silenced unintentionally. It should bubble up by default.

Further details

Links / references

Edited by Shinya Maeda