Skip to content

Observability API (django-prometheus)

TL;DR: Propose an Observability API by installing django-prometheus

Overview

These changes are for implementing an Observability API. The purpose of this API is to enable GT system's internal states to be observable by monitoring tools such as Prometheus and consequently allow Alerting (e.g. Alertmanager, Pagerduty) and Charting (e.g. Grafana) as well.

This API makes GT compatible with environments that require certain Service Reliability levels from their services. For example, in a company that has a Site-Reliability Engineering team, which is common nowadays, it is likely that a Service Level Objective (SLO) is a requirement for running GT so that it can ensure that it will not breach a Service Level Agreement (SLA) - part of a legal contract. As a consequence, the SLO will require a Service Level Indicator (SLI) that is basically a query (e.g. Prometheus query) that would measure some internal status of the service and would indicate some customer's experience. This Observability API will enable such status to be measured.

For more information about Observability - https://www.ibm.com/cloud/learn/observability

Additionally, this Observability API also enhances the cloud-nativeness of GT. In cloud-native environments like kubernetes, it is essential for a service to be observable enough for monitoring, alerting, and troubleshooting purposes, ultimately to promote a good Availability score.

Implementation

The most important internal states of an API server such as GT are its HTTP statuses, namely the statuses of HTTP requests and responses. These statuses and their related metrics (e.g. latency) are usually enough to inform a user or HTTP client's experience.

These internal states are exposed by installing django-prometheus and mapping it into an API endpoint at /api/0/observability. Additionally, a tiny view is created to wrap django-prometheus so that it will be protected by the default authentication method.

Django-prometheus simply implements a prometheus-compatible metrics that is collected through a django middleware, thus the changes in the settings.py file. The installation of django-prometheus also enables the following metrics;

https://github.com/korfuri/django-prometheus/blob/master/django_prometheus/middleware.py#L29

Edited by Jose Gavine Cueto

Merge request reports