Proposal: Deploy at most one change per deployment

Summary

In the context of production#7284 (closed) it took quite a lot of time to discover the actual root cause of the issue. Part of the problem was that the deployment pipeline had upgraded multiple parts of the application stack at the same time, where the upgrade included both changes to Gitaly itself and changes to the Gitaly configuration via Omnibus. And because changes in Gitaly itself are the most frequent cause of incidents in production, SREs naturally first went down this path and tried to revert the upgraded Gitaly version.

Speaking from my own experience it's always hard to debug something when multiple parts of a system have changed at the same time. We might want to investigate whether it makes sense to adjust the deployment strategy to roll out at most one change per deployment in order to keep the amount of changes minimal and test these changes in isolation. Chances are high that this would have signficantly sped up the process to discover the actual root cause in this incident.

Related Incident(s)

Originating issue(s): production#7284 (closed)

Desired Outcome/Acceptance Criteria

It becomes easier to find the root cause of an incident when changes are introduced "atomically", where only one thing changes at a time. Ultimately, this can help resolving incidents quicker in a subset of cases.

Associated Services

ServiceGitaly

Corrective Action Issue Checklist

Link the incident(s) this corrective action arose out of
Give context for what problem this corrective action is trying to prevent from re-occurring
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
Assign a priority (this will default to 'priority::4')