Gitaly quota
Introduction
- This is a proposal to implement quota's to Gitaly.
- It is a generalized version of our per user quota #429 (closed) (pretty cool that that issue number matches the status code for rate limiting)
- It is inspired by Gitmon https://www.youtube.com/watch?v=f7ecUqHxD7o&feature=youtu.be&t=8m37s
- We need quota's for the file servers because they are stateful, applications servers are stateless and we can autoscale them when there is more demand and spread the load among them, file servers contain specific repositories and have a finite capacity. Do note that we have Rack Attack to protect the application servers a bit.
- The fileservers run git operations, these are very CPU, memory, and network intensive. A user consuming a lot of resources causes a service degradation for everyone else.
- If this was about request rate limiting we could probably use something like Envoy https://github.com/envoyproxy/envoy or Istio https://istio.io/ but we need to measure resource consumption here, a certain request can require 1000x the CPU time of another one.
- GitHub eventually moved to Spokes https://githubengineering.com/building-resilience-in-spokes/ that had multiple fileservers for the same repository. We will probably not need that. We run networked storage in the cloud where the public cloud provider is responsible for the redundancy of the files. To do cross availability zone failover we'll use GitLab Geo.
Features
- During every request checks if the account is over quota for a resource, if so Gitaly doesn't do anything and returns a 429 https://httpstatuses.com/429 and logs the request
- Resources: CPU, memory, and network usage
- Accounts: repo, user, and client IP address
- If the user is over quota the Rails application server does a retry later (possibly with exponential backoff) or fails the request.
- The resource usage of every request is added to the total usage of the user in the last minute.
- Every Gitaly server has a current maximum limit per resource per account (100 seconds of CPU time per repo per minute).
- Every second the Gitaly server adjust the limit per resource.
- If there is low resource usage (<50% CPU) the limit is doubled (up to a maximum).
- If there is high resource usage (>80% CPU) the limit is halved (down to 1).
- It is possible to see the current limits for each resource in prometheus.
First Iteration
- Only calculate the most important resource (CPU).
- Have fixed prices per action instead of calculating real usage.
- Fixed quota per user instead of a dynamically adjusted one.
- A per repo limit.
Storage options for the user resource consumption
- In the Go program (simple and fast)
- In a local Redis (large binary and fast)
- In a local SQLite (small binary and slow)
- In a central Redis (slow due to network round trip but you can do real per user/IP address limits instead of per fileserver)
Difference from #429 (closed) in the first iteration
- Don't limit the number of requests but limit the resource usage.
- When over limit don't queue requests but decline them, this prevents memory from blowing up due to a big queue and allows the Rails application to make better decisions (back off or show error).
Edited by Sid Sijbrandij