Rate limit RawController#show in Rails with Gitlab::ActionRateLimiter
We have a recurring type of outage on gitlab.com where a user hosts a 'raw' file out of a Git repository. We call this abuse internally, but it's easy to understand how this goes wrong without assuming malice. Somebody shares a link to a file that happens to be in a Git repo, the link becomes popular, "boom". Also known as the Slashdot effect.
I wonder if we can get a handle on this problem by rate-limiting the path of each /raw/ request, for instance in the Rails controller that handles these requests: RawController#show. What is nice about this is that would be automatic.
I'm not sure if we can re-use the existing rate limiter we have that is used to throttle authentication attempts per IP. But even if we can't I'm sure we can find a gem which can use Redis as its backing store to do rate limiting.
We should be able to use Gitlab::ActionRateLimiter for this.
So what we'd need is:
- an application config setting, e.g. "Raw blob request rate limit (per unique blob link per minute)", default 300
- code in RawController#show that uses Gitlab::ActionRateLimiter to implement this limit
- some tests that assert that the rate limiting happens
- metrics so that we can see when rate limiting is active, and logging so we can see which repos are being rate-limited
Development log
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/30635 and https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14830 were merged.
- 5/August - 6dcde68b is on staging and canary, but not on production.
- Currently waiting for canary to be promoted to production so I can do a final test.
- Commit is already on GitLab.com
- Documentation to be added on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31723
- Another MR was created to also include the corresponding logs.
- It was suggested that Rate Limiter should not return
302
redirect but429
instead, as it can lead to more load on our servers instead of less - https://gitlab.com/gitlab-org/gitlab-ce/issues/65974- Fix on https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31777/
- Landed on production on August 13th. Manually verified on staging https://gitlab.com/gitlab-org/gitlab-ce/issues/65974#note_203672097
- Currently waiting for canary to be promoted to production so can I test https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31777 there.
- Verified on production https://gitlab.com/gitlab-org/gitlab-ce/issues/65974#note_204060316 and it's working as expected.
- Announced on Slack channels:
#infrastructure-lounge
and onsupport_gitlab-com
Manual QA
- A test project was created on staging and production
- The threshold key was manually incremented for each project using the following script:
=> project = Project.find(project_id)
=> #<Project id:xxxxx mayra-cabrera/rate_limit_project>
[ gstg ] production> commit = project.commit
=> #<Commit id:c7d mayra-cabrera/rate_limit_project@c7d...>
[ gstg ] production> path = 'my_file.md'
=> "my_file.md"
[ gstg ] production> key = [project, commit, path]
[ gstg ] production> limiter = ::Gitlab::ActionRateLimiter.new(action: :show_raw_controller)
[ gstg ] production> (1..300).each do |variable|
[ gstg ] production> ?> limiter.increment(key)
[ gstg ] production> end
[ gstg ] production> limiter.throttled?([project, commit, path], 300)
=> true
Staging
- Project: https://staging.gitlab.com/mayra-cabrera/rate_limit_project
- Accessing the raw page after executing the script, we can see it's blocked:
Production
- Project: https://gitlab.com/mayra-cabrera/raw-limit-test-project
- Accessing the raw page after executing the script, we can see it's blocked:
- The rate limit was successfully recorded on Kibana - https://log.gitlab.net/goto/0181456d2f911eee19c5d195a1a1c012