Document hot patching practice
Context
As we implement the hot patching practice to our monthly release process, we should have documentation on the practice.
We currently have steps on the process if it weren't a practice, but we should supplement it with steps for the practice, as we would be changing a few steps.
Key differences between actual hot patching run vs practice run
The following are some points I noticed while performing the practice run for 16.4.
- All the steps would be run by a delivery member, not backend engineer or sre-oncall.
- After the initial notice to sre-oncall, we shouldn't involve them in further steps for the practice
- There should be a step in the practice run to ensure that no
gstg
orgprd
deployments are happening. In a real scenario, there would be a severity1 incident, which would block deployments. - For the sake of having complete documentation, we should add in step on how to create a test severity4 incident (without pinging sre-oncall)
- For steps 7-10 of the steps for backend engineers, we can skip.
- Instead, for step 11, we should copy another practice patch file like this one into the folder that was created by the new MR.
- There should also be a step to delete changes to the canary stages (
gstg|gprd-cny
), so we only run the practice in higher environments (gstg
andgprd
)
- Since this practice is blocking deployments, if a patch fails before being applied, we should be able to revert the MR merge and cancel the practice to investigate further while not further blocking deployments.