Increase likelihood of Prometheus proxy API hitting the cache
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
The prometheus proxy API uses reactive caching. The cache key is all the parameters of the request. Since the end (one of the time boundaries in a Prometheus range query) is a parameter, and its value is the current time in milliseconds, the cache is almost never going to be hit. Even if the same user who has just made a request refreshes the page, the new request will contain a different end time and the cache will not be hit.
An example request (url has been decoded):
https://gitlab.com/jivanvl/test-node-project/environments/711946/prometheus/api/v1/query_range?query=sum(rate(nginx_upstream_responses_total{upstream=~"%{kube_namespace}-%{ci_environment_slug}-.*"}[2m])) by (status_code)&start=1563862907.323&end=1563891707.323&step=60
After this issue has been solved, remember to update the reactive_cache_refresh_interval of the Prometheus::ProxyService. It was adjusted in !20006 (merged) to not refresh the cache within one lifetime of the cache.
Intended users
Further details
Proposal
Some options to solve this:
-
The infrastructure team uses Trickster in front of the public dashboard (https://dashboards.gitlab.com/). We could explore installing the Trickster helm chart as an option (https://github.com/Comcast/trickster/tree/master/deploy/helm/trickster)
-
Implement something similar to Trickster's "Step Boundary Normalization" and "Fast Forward" features - https://github.com/Comcast/trickster