Gitaly quota

Introduction

This is a proposal to implement quota's to Gitaly.
It is a generalized version of our per user quota #429 (closed) (pretty cool that that issue number matches the status code for rate limiting)
It is inspired by Gitmon https://www.youtube.com/watch?v=f7ecUqHxD7o&feature=youtu.be&t=8m37s
We need quota's for the file servers because they are stateful, applications servers are stateless and we can autoscale them when there is more demand and spread the load among them, file servers contain specific repositories and have a finite capacity. Do note that we have Rack Attack to protect the application servers a bit.
The fileservers run git operations, these are very CPU, memory, and network intensive. A user consuming a lot of resources causes a service degradation for everyone else.
If this was about request rate limiting we could probably use something like Envoy https://github.com/envoyproxy/envoy or Istio https://istio.io/ but we need to measure resource consumption here, a certain request can require 1000x the CPU time of another one.
GitHub eventually moved to Spokes https://githubengineering.com/building-resilience-in-spokes/ that had multiple fileservers for the same repository. We will probably not need that. We run networked storage in the cloud where the public cloud provider is responsible for the redundancy of the files. To do cross availability zone failover we'll use GitLab Geo.

Features

During every request checks if the account is over quota for a resource, if so Gitaly doesn't do anything and returns a 429 https://httpstatuses.com/429 and logs the request
Resources: CPU, memory, and network usage
Accounts: repo, user, and client IP address
If the user is over quota the Rails application server does a retry later (possibly with exponential backoff) or fails the request.
The resource usage of every request is added to the total usage of the user in the last minute.
Every Gitaly server has a current maximum limit per resource per account (100 seconds of CPU time per repo per minute).
Every second the Gitaly server adjust the limit per resource.
If there is low resource usage (<50% CPU) the limit is doubled (up to a maximum).
If there is high resource usage (>80% CPU) the limit is halved (down to 1).
It is possible to see the current limits for each resource in prometheus.

First Iteration

Storage options for the user resource consumption

In the Go program (simple and fast)
In a local Redis (large binary and fast)
In a local SQLite (small binary and slow)
In a central Redis (slow due to network round trip but you can do real per user/IP address limits instead of per fileserver)

Difference from #429 (closed) in the first iteration

Don't limit the number of requests but limit the resource usage.
When over limit don't queue requests but decline them, this prevents memory from blowing up due to a big queue and allows the Rails application to make better decisions (back off or show error).

Edited Oct 10, 2017 by Sid Sijbrandij

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information