Isolating Rails traffic by endpoints
Current state: All traffic is served by 3 services: Web, API and Git. Traffic is divided among those services by HAProxy based on regexes.
Also brought up in mstaff#101 (comment 816404028)
Problem to solve: A single bad endpoint getting hit more often than expected could saturate one of these services, affecting all of the traffic handled by the service.
We could come up with a better way of dividing traffic based on properties we've defined on endpoints (feature_category
, urgency
or anything else we come up with), similar to what we do for Sidekiq. In practice this could mean generating config in CI based on these properties.
Idea 1: Separate traffic by Stage Group
(original idea)
Using the feature categories on endpoints we could perhaps separate traffic by stage group
-
👍 People can be more closely involved with the resources they use and we can correlate cost to run a feature with usage -
👍 Potentially, we could work towards putting deploys in the hands of developers building the features -
👎 Because not all groups have the same patterns and size of traffic, groups with less traffic need to be over-provisioned more relatively speaking to be able to handle surges in traffic, we're likely going to be running less efficient this way
Idea 2: Separate traffic by traffic share and stage group
This way groups with a lot of traffic could end up on their own fleet, while groups with less traffic could be combined into a fleet
This partially solves the efficiency problem in Idea 1. But at the cost of less isolation.
Idea 3: Separate traffic by urgency
This is kind of similar to what we do for Sidekiq. We could direct the high urgency traffic to a bigger fleet, while low urgency traffic that is likely more resource intensive is handled elsewhere. This makes the less urgent traffic less of a noisy neighbour for the urgent traffic.
Idea 4: Separate traffic by revenue
We could figure out what endpoints are backed by the most revenue: how much traffic * how much are the users generating traffic paying for GitLab.com. (We could call it the marquee shard, why not).
This would mean that we isolate the most important endpoints on a single shard, meaning we only need to keep a bigger amount of headroom on that shard, while the others could have less.