ELLA healthchecks
Background
Having API endpoint(s) that can be easily curled to get the health status of ELLA is very useful for effective monitoring
Implementation
Two new endpoints
-
/heartbeat
- #1928- always returns 200 with current timestamp
- basic, quick, "are workers alive and responsive" check
-
/healthcheck
- #1929- Detailed app status
/healthcheck
- Uses separate worker pool to ensure it is available even if ELLA is not
- Detailed app status
Further steps
A more robust monitoring structure could be achieved by integrating statsd into the backend code. Gunicorn already has it baked in. This would allow using statsd and graphite for monitoring performance. CollectD can be used for getting nginx / system stats and also sending them to graphite.
The statsd/collectd/graphite configuration are definitely outside the scope of the app, but sending metrics to statsd is a very lightweight operation and can be disabled easily (as gunicorn does).
Edited by Tor Solli-Nowlan