Check deployment strategies to determine if anything prevents us from upgrading in some areas

Summary

We run mixed production deployments where our canary fleet uses the latest production code before it is rolled out to the rest of the production fleet. Backwards incompatible changes can prevent deployments and cause serious production issues.

In general: With multi-node setups, at any given time two versions of the application can run, which means that the code needs to be forward compatible.

This incident and corrective action pointed to a case where we can't upgrade:

Other examples of incidents related to mixed deployment compatibility issues:

Proposal

We need to do an audit of teams to determine if there is anything else that falls into this category. This is an action item that we are tracking for the next engineering-wide retrospective.

Outcome

Responses were summarized in #11010 (comment 545261471) and a follow-up issue was opened to create a training issue template using the information gathered here.

Instructions

  1. Please see the summary and example incidents for more context about this issue. If you have other examples to add, it would be much appreciated!
  2. If your team is working on (or planning to work on) anything that may have backward compatibility issues with our mixed deployment strategy, please comment in the issue with more detail and answer the following questions as well:
    • What strategies does your team have to identify, prevent, and mitigate potential canary/main compatibility issues in production?
    • How can we better identify, prevent, and mitigate this type of problem as an organization?
  3. Add a in Audit Complete? for your team in the table below.
  4. Ping @nhxnguyen if you have any questions or feedback about this audit. Ideally, we will complete this before the 13.10 live retrospective. Thank you in advance!
Team Eng Manager Audit Complete?
Configure @nicholasklick
Create:Editor @rkuba
Create:Source Code BE @sean_carroll
Create:Source Code, Code Review FE @andr3
Create:Code Review BE @m_gill
Database @craig-gomes
Distribution @mendeni
Ecosystem BE @mnohr
Ecosystem FE @leipert
Global Search @changzhengliu
Growth:Activation, Adoption, Conversion, Expansion @pcalder
Fulfillment:License @jameslopez
Fulfillment:Purchase @chris_baus
Fulfillment:Purchase FE @rhardarson
Fulfillment:Utilization @csouthard
Geo @nhxnguyen
Gitaly @zj-gitlab
Manage:Access, Import BE @lmcandrew
Manage:Access,Compliance, Import FE @dennis
Manage:Optimize, Compliance BE @djensen
Manage:Optimize FE @wortschi
Memory @craig-gomes
Monitor @crystalpoole
Package @dcroft
Plan @johnhope
Product Intelligence @jeromezng
Release @nicolewilliams
Secure:Composition Analysis BE @gonzoyumo
Secure:Dynamic Analysis, Fuzz Testing BE @sethgitlab
Secure:Static Analysis BE @twoodham
Secure FE @nmccorrison
Threat Management BE @thiagocsf
Threat Management FE @lkerr
Verify:CI, Pipeline Authoring BE @cheryl.li
Verify: Pipeline Authoring, CI FE @samdbeckham
Verify:Runner @erushton
Verify:Testing @rickywiens
Edited by Crystal Poole