Increase use of canary.gitlab.com
In discussion with @jarv about usefulness of canary, @jarv had a suggestion on how to increase the amount of traffic canary sees.
I suggested that we find a way to force all GitLab Inc employees to go directly to canary.
@jarv suggested that we add a rule to our LBs that will direct all gitlab.com/gitlab-org
and gitlab.com/gitlab-com
traffic automatically to canary. In theory, this should not be difficult to achieve
HAProxy allows us to set the server backend states {weights, drain, maint} without having to reload or update the configuration. This proposes a minimal change to direct a more internal traffic to the canary backends by using the request path.
New rules for directing canary traffic
- HTTPS web when the cookie
gitlab_canary=true
is set: Currently supported - Optionally a percentage of web/api/git traffic using weights: Currently supported
- HTTPS web to
gitlab-com/
: New - HTTPS api to
gitlab-com/
: New - GIT https to
gitlab-com/
: New - Registry https to
gitlab-com/
: New
Traffic that will not use the canary
Like before, the following traffic will be unable to take advantage of canary.
- GIT ssh
- Sidekiq - there is no canary sidekiq until we solve namespacing
- Gitaly - there is no canary gitaly until we can run multiple gitalys on a single file server
- pages - pages uses tcp load balancing so we cannot inspect the request path or direct traffic with a cookie.
- mailroom - mailroom polls a shared mailbox for reply notifications. we could add a single node as a canary but this is probably not necessary for the first iteration.
Routing rules
- By default, direct all traffic with gitlab_canary=true to the canary backends if they are healthy
- By default, direct all traffic with
gitlab-com/
in the group path to the canary backends if they are healthy
Redirecting canary traffic
In order to direct gitlab-com traffic to canary there will need to be a fast way to ensure that no traffic is directed there.
- Set all servers in the canary backends to drain -> maint
- This should mark the backend has having no healthy servers for processing requests
- All canary traffic that either has the canary cookie or a
gitlab-com
request path will fallback to the main backends
graph LR;
h1[gitlab.com haproxy]
h1 --> a1
h1 -- gitlab_canary=true --> c1
subgraph web/api/git main backends;
a1[web,api,git ... WEIGHT=100];
a2[web-cny,api-cny,git-cny ... WEIGHT=0];
end
subgraph web/api/git canary backends;
c1[web-cny,api-cny,git-cny ... WEIGHT=100];
end
current HAProxy configuration
In the current configuration the following routing rules are in place:
- Requests with
gitlab_canary=true
is routed to the canary backend - Server weights are used on the main backends to allow a small amount of traffic to optionally be sent to the canary backend
- There are no health checks for the canary backends so if there are no servers in canary that can process requests and the cookie is set, requests will fail (this is why staging doesn't work when the cookie is set)
proposed HAProxy configuration
- Requests to
/gitlab-com
that do not have agitlab_canary
cookie will setgitlab_canary=true
- Requests with
gitlab_canary=true
are routed to the canary backend - Additional logic is added so that if the canary backends are unhealthy we fallback to the main backends
- Server weights are used on the main backends to allow a small amount of traffic to optionally be sent to the canary backend