Use separate Unicorns to handle API load
Right now we have general purpose servers. Server that are designed to process all kind of load. Our general purpose Unicorns do:
- Process Rails requests,
- Process Shell API calls,
- Process GitLab API calls,
- Process CI API calls.
Sometimes it happens to have outages, where we see increase on DB load or NFS load. During that time all requests take a longer time to finalise. However, this has catastrophic consequences. Right now we have about 60.000.000 request/daily for CI API. This requests are really fast to execute, but if we hit a increased load they start to be queued and use all resources of general purpose servers. This makes all other types of requests to be slow in processing or having a 503 error. There's not easy way to reduce amount of CI API requests, and in case of the outage they will always be queued on general purpose servers. Currently we usually resolve that situation by bringing Deploy page to reduce the load and reduce the amount of queued requests.
There's also a second problem with general purpose servers. They are not designed to support different requirements for quality of service (time of service). For example we expect for Rails requests to be executed right away, but we know that it's completely OK for GitLab and CI API calls to be queued and executed with a delay if the load increases. Since we use general purpose servers all our requests do have the same quality of service. This seems OK for a mid-size installations, but this doesn't seem to work well for GitLab.com which is handling million of requests daily. And any load increase and slowness in requests processing make a catastrophic consequences.
My idea is to split processing of Rails requests and API requests to separate servers. This allows us to more easily handle increased load and better monitor the response time of each of the services:
- Rails and Shell API requests would be handled by general purpose servers as it's, but optimised for minimal delay response,
- GitLab and CI API requests would be handled by dedicated servers, but optimised for limiting a number of requests being processed at single time. We will be able to say that all API requests are processed by at most 100 unicorns at a time. If we see a increase in API response time we will be able to easily scale this up, thus more easily handle the load and consequences of scaling up the servers.
The above approach allows us to reduce a chance of the outage. The API calls will be handled by dedicated servers, and they will be backed off if we happen to have a slowness in DB or NFS. The requests will still be processed, but a little slower. It's important that these requests will have timeouts that will allow to rate limit the amount, thus delivering to client 503 errors allowing him to gracefully handle the outage. This will allow to much easier to self heal outage, but limiting the API impact and not affecting the Rails side.
cc @pcarranza