Add an API call that reaches backend to the healthcheck
Problem to solve
Nodes that are under severe load still report as healthy because the services are up . As described by our customer:
This health check was implemented, but still returned positive while the node was in trouble. Load average over 25. I'd still like to see a health check that can detect when this app can't handle further requests. This health check may be looking for nginx to respond or something too simple.
Target audience
Further details
This feature will improve robustness of HA deployments.
Proposal
We could add a backend call that fails if timeout is reached. Alternatively we could calculate system load against server size and return if higher than X (e.g. 1/5/15 minute load - # of cpu cores < 1 = success, or maybe not quite that strict).
What does success look like, and how can we measure that?
HTTP fail code is returned if API is not responding within timeout restrictions.
Links / references
https://gitlab.zendesk.com/agent/tickets/113157 (Internal use only)