Increase likelihood of Prometheus proxy API hitting the cache

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Problem to solve

The prometheus proxy API uses reactive caching. The cache key is all the parameters of the request. Since the end (one of the time boundaries in a Prometheus range query) is a parameter, and its value is the current time in milliseconds, the cache is almost never going to be hit. Even if the same user who has just made a request refreshes the page, the new request will contain a different end time and the cache will not be hit.

An example request (url has been decoded): https://gitlab.com/jivanvl/test-node-project/environments/711946/prometheus/api/v1/query_range?query=sum(rate(nginx_upstream_responses_total{upstream=~"%{kube_namespace}-%{ci_environment_slug}-.*"}[2m])) by (status_code)&start=1563862907.323&end=1563891707.323&step=60

After this issue has been solved, remember to update the reactive_cache_refresh_interval of the Prometheus::ProxyService. It was adjusted in !20006 (merged) to not refresh the cache within one lifetime of the cache.

Intended users

Further details

Proposal

Some options to solve this:

The infrastructure team uses Trickster in front of the public dashboard (https://dashboards.gitlab.com/). We could explore installing the Trickster helm chart as an option (https://github.com/Comcast/trickster/tree/master/deploy/helm/trickster)
Implement something similar to Trickster's "Step Boundary Normalization" and "Fast Forward" features - https://github.com/Comcast/trickster

Increase likelihood of Prometheus proxy API hitting the cache

Problem to solve

Intended users

Further details

Proposal

Permissions and Security

Documentation

Testing

What does success look like, and how can we measure that?

Links / references