Skip to content

Improve logging for database load balancer host_offline events

What

The database load balancer logs a host_offline event when a host is marked offline after a replica status check, but currently no information about which health check failed or what the failure parameter was is logged along with the event. We should improve this logging to include this information.

Why

In multiple incidents, we have looked to these log events for information about the timeline and topology of database failures. Having this information would be helpful in order to understand what sequence of events led to degraded performance.

Related to gitlab-com/gl-infra/production#18565 (closed)