Skip to content

Improve healtchecks 500 response when DB/Redis are not available

What does this MR do?

  • Improve 500 – Internal Server Error response from the probe endpoint when DB is down – include an exception info into the message instead of failing with generic 500
  • This as well applies to any unhandled exception which happens during the probing, e.g. Redis connection error
  • Expand docs describing incorrect probe endpoint responses when using the deprecated access token
  • Adds additional specs for DB down scenario

When DB is not available we don't want return generic 500 on our Health Checks, e.g. on /-/readiness?all=1.
Instead, we want to provide more context.

For example:

{
   "status":"failed",
   "message":"PG::ConnectionBad : could not connect to server: No such file or directory\n\tIs the server running locally and accepting\n\tconnections on Unix domain socket \"/Users/lipton/dev/gitlab/gitlab-development-kit/postgresql/.s.PGSQL.5432\"?\n"
}

There are many places we may potentially fail with PG::ConnectionBad when DB is down, e.g. because of GitalyClient.timeout: #219046 (comment 363634780)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

To test locally:

  • in config/environments/development.rb, set config.active_record.migration_error = false (as it in prod)
  • Start GL (GDK/dev is fine)
  • Visit /-/readiness?all=1 endpoint: it should be status 200.
  • Shut down the DB, e.g. gdk stop postgresql
  • Visit /-/readiness?all=1 endpoint: you should see an expanded 500 error message.

To additionally imitate multi-host env, you could:

diff --git a/app/controllers/concerns/requires_whitelisted_monitoring_client.rb b/app/controllers/concerns/requires_whitelisted_monitoring_client.rb
index c92b1cecaaa..282994fd577 100644
--- a/app/controllers/concerns/requires_whitelisted_monitoring_client.rb
+++ b/app/controllers/concerns/requires_whitelisted_monitoring_client.rb
@@ -16,9 +16,10 @@ module RequiresWhitelistedMonitoringClient
   def client_ip_whitelisted?
     # Always allow developers to access http://localhost:3000/-/metrics for
     # debugging purposes
-    return true if Rails.env.development? && request.local?
+    # return true if Rails.env.development? && request.local?

     ip_whitelist.any? { |e| e.include?(Gitlab::RequestContext.instance.client_ip) }
+    return true
   end

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #219046 (closed)

Edited by 🤖 GitLab Bot 🤖

Merge request reports