Evaluate shared rate limiting and make pages rate limits deterministic
Summary
From https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5789#note_715520786
would we eventually consider having a shared cache (using Redis, for example) in the future?
From #755 (closed) (similar duplicate issue)
Currently, Pages rate limits are enforced at a process/pod level. Thus, it not possible to deterministically conclude if a request will be rate limited or not since it depends on which process/prod would be serving the request. There is not much visibility in real-time in how many requests are available for an IP/domain before it will get blocked.
This also creates some difficulty while analyzing production incidents regarding rate limits .
Probably use Redis to store rate limits which are currently being stored in memory at a per process/pod level? Thus making it deterministic?
We should also consider adding some more visibility into rate limits. As a user or administrator, I should be able to tell at a given point in time how many more requests are allowed before I get rate limited for this IP and this domain? e.g. adding rate limits in the headers of a request? (of course, this is only an option if we are able to deterministically calculate rate limits)