Consider putting some or all production gitaly servers in front of api/git/web canary

Because rails changes may rely on new gitaly features, we deploy to gitaly before rails when we deploy to production.

One exception to this is canary (git/api/web), which uses he production gitaly at the older, previous version. This is due to us not having a way to run two different versions of gitaly for the same storage shard.

This issue is to discuss how to address this potential problem as we start adding more traffic to canary. Below are possible approaches for dealing with this issue:

Run a canary gitaly. As part of https://gitlab.com/gitlab-org/release/framework/issues/140, select the storage shards that have the repositories that are canary and move them in front of the canary deployment. This doesn't really scale since as we add new shards, they would need to be added to the beginning of the pipeline. It would look like this:

[ stg gitaly ] -> [ stg fleet ] -> [ gitlab-com/gitlab-org gitaly ] -> [ cny servers ] - > [ prd gitaly ] -> [ prd fleet ]

Keep the pipeline as-is but stress that rails changes that take advantage of new gitaly features should be behind a feature flag.
Put all gitaly servers in front of canary

[ stg gitaly ] -> [ stg fleet ]- > [ prd gitaly ] -> [ cny servers ] -> [ prd fleet ]

Split staging into left/right brain deployments with gitaly first with a long enough delay between them so we can potentially trap errors on staging. This should be done together with (2), but I'm not sure whether we have enough test coverage to realistically catch problems before production.
Do nothing

Doing nothing is a risk but it only impacts a subset of repositories when we add more traffic on gitlab-canary.

/cc @gitlab-org/delivery

Edited Jan 28, 2019 by John Jarvis