Skip to content

Log db host refresh thread interruption

Prabakaran Murugesan requested to merge 364370-log-dns-refresh-interruption into master

What does this MR do and why?

There is a Thread which runs infinitely (with some sleep) which refreshes the Db LB hosts but for some reason the hosts are not getting refreshed as expected, which resulted in #364370 (closed).

As the first step, all unhandled exceptions were logged to see if there were any mishaps but nothing was captured in that.

This MR implements an alternate solution to log (when the refreshing is not taking place), more info can be found here.

cc: @stomlinson @DylanGriffith

How to set up and validate locally

Instructions for setting up Db load balancing with service discovery in local environment can be found here.

Unhappy flow:

  1. To replicate refresh_thread_last_run in past, change refresh_thread_last_run to Time.current - 1.hour in here.
  2. Observe for the error event in logs - tail -f log/database_load_balancing.log | grep service_discovery_refresh_thread_interrupt.
  3. gdk restart (or restart only rails-web)
  4. In few seconds, we should be able to the logs coming in.
{"severity":"ERROR","time":"2023-06-09T15:36:02.097Z","correlation_id":null,"event":"service_discovery_refresh_thread_interrupt","refresh_thread_last_run":"2023-06-09T14:36:01.038Z","thread_status":"sleep"}
{"severity":"ERROR","time":"2023-06-09T15:36:02.295Z","correlation_id":null,"event":"service_discovery_refresh_thread_interrupt","refresh_thread_last_run":"2023-06-09T14:36:01.929Z","thread_status":"sleep"}

Happy flow:

  1. Revert the refresh_thread_last_run change
  2. gdk restart and we should not be seeing any error logs with event 'service_discovery_refresh_thread_interrupt'.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #364370 (closed)

Edited by Prabakaran Murugesan

Merge request reports