Shared Runner Queues - 2017-01-31
**NOTE** This is a recreation of an issue in its entirety from before the data loss event.
Our shared runners have experienced major slow downs due to DB issues. @tmaczukin and I
worked all day on identifying and fixing this. Here is what we know so far related to the shared runner outage.
At around 2:20 AM, a project started over 3,000 builds within a 10 minute period with 24 jobs per commit and 147
commits. This began our major slowdown. Over 1,000 of these builds will remain pending as they do not have runners with
the proper tags to pick the builds up.

Compounding on this, we also began to have DB issues as per. This caused major throttling on the shared runners who could
then no longer pick up builds. Unfortunately, there isn't much we can do about this at this time as we cannot raise the
throttle because that will just cause more DB problems. We are going to need to get the DB issues under control for the
runners to begin processing at full speed.
As we can see, yesterday the runners were able to pick up many builds at once and deal with the queue yet today so far
the average has only been around 15 builds per runner.
Yesterday:

Today so far:

Is there anything else I've forgotten that is important, @tmaczukin?
cc/ @yorickpeterse @northrup @pcarranza
issue