Increase use of canary.gitlab.com

In discussion with @jarv about usefulness of canary, @jarv had a suggestion on how to increase the amount of traffic canary sees.

I suggested that we find a way to force all GitLab Inc employees to go directly to canary.

@jarv suggested that we add a rule to our LBs that will direct all gitlab.com/gitlab-org and gitlab.com/gitlab-com traffic automatically to canary. In theory, this should not be difficult to achieve

HAProxy allows us to set the server backend states {weights, drain, maint} without having to reload or update the configuration. This proposes a minimal change to direct a more internal traffic to the canary backends by using the request path.

New rules for directing canary traffic

  • HTTPS web when the cookie gitlab_canary=true is set: Currently supported
  • Optionally a percentage of web/api/git traffic using weights: Currently supported
  • HTTPS web to gitlab-com/: New
  • HTTPS api to gitlab-com/: New
  • GIT https to gitlab-com/: New
  • Registry https to gitlab-com/: New

Traffic that will not use the canary

Like before, the following traffic will be unable to take advantage of canary.

  • GIT ssh
  • Sidekiq - there is no canary sidekiq until we solve namespacing
  • Gitaly - there is no canary gitaly until we can run multiple gitalys on a single file server
  • pages - pages uses tcp load balancing so we cannot inspect the request path or direct traffic with a cookie.
  • mailroom - mailroom polls a shared mailbox for reply notifications. we could add a single node as a canary but this is probably not necessary for the first iteration.

Routing rules

  • By default, direct all traffic with gitlab_canary=true to the canary backends if they are healthy
  • By default, direct all traffic with gitlab-com/ in the group path to the canary backends if they are healthy

Redirecting canary traffic

In order to direct gitlab-com traffic to canary there will need to be a fast way to ensure that no traffic is directed there.

  • Set all servers in the canary backends to drain -> maint
  • This should mark the backend has having no healthy servers for processing requests
  • All canary traffic that either has the canary cookie or a gitlab-com request path will fallback to the main backends
graph LR;
   h1[gitlab.com haproxy] 
   h1 --> a1
   h1 -- gitlab_canary=true --> c1
 
    subgraph web/api/git main backends;
    a1[web,api,git ... WEIGHT=100];
    a2[web-cny,api-cny,git-cny ... WEIGHT=0];

    end
    subgraph web/api/git canary backends;
    c1[web-cny,api-cny,git-cny ... WEIGHT=100];
    end

current HAProxy configuration

In the current configuration the following routing rules are in place:

  • Requests with gitlab_canary=true is routed to the canary backend
  • Server weights are used on the main backends to allow a small amount of traffic to optionally be sent to the canary backend
  • There are no health checks for the canary backends so if there are no servers in canary that can process requests and the cookie is set, requests will fail (this is why staging doesn't work when the cookie is set)

proposed HAProxy configuration

  • Requests to /gitlab-com that do not have a gitlab_canary cookie will set gitlab_canary=true
  • Requests with gitlab_canary=true are routed to the canary backend
  • Additional logic is added so that if the canary backends are unhealthy we fallback to the main backends
  • Server weights are used on the main backends to allow a small amount of traffic to optionally be sent to the canary backend

/cc @gitlab-org/delivery

Edited Jan 21, 2019 by John Jarvis
Assignee Loading
Time tracking Loading