Skip to content

Increases LB SD refresh thread interruption delta time

Prabakaran Murugesan requested to merge 364370_increase_sd_refresh_delta into master

What does this MR do and why?

MR that introduced the logging: !121643 (merged) - it has the backstory and reasoning for the new log as well.

On monitoring the logs, there were ~5500 occurrences of service_discovery_refresh_thread_interrupt event with below backtrace.

/srv/gitlab/lib/gitlab/database/load_balancing/host.rb:85:in sleep'
/srv/gitlab/lib/gitlab/database/load_balancing/host.rb:85:in disconnect!'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:155:in block in replace_hosts'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:154:in each'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:154:in replace_hosts'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:132:in refresh_if_necessary'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:95:in block in perform_service_discovery'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:94:in times'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:94:in perform_service_discovery'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:83:in block (2 levels) in start'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:80:in loop'
/srv/gitlab/lib/gitlab/database/load_balancing/service_discovery.rb:80:in block in start'

In https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/load_balancing/host.rb#L82, the condition can allow sleep of max 4s. i.e: 2 loops, with 120 seconds as the timeout.

So increasing the existing DISCOVERY_THREAD_REFRESH_DELTA to 5 (>4s)

Screenshots or screen recordings

Source: Kibana logs

Screenshot 2023-07-18 at 11.55.07.png

Screenshot 2023-07-18 at 11.55.58.png

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Prabakaran Murugesan

Merge request reports