Executable Runbooks for Releases MVC
Problem to solve
Managing releases in GitLab from a release checklist point of view is difficult. For an example of how this is done in GitLab today, you can see how we managed the 11.4 release at gitlab-org/release/tasks#462 (closed) and gitlab-org/release/tasks#460 (closed). There are manual tasks defined in the issue description in markdown, and these are discussed and checked off as things go. This is a pain for a few reasons:
- Context is minimal, detailed instructions could be linked to but aren't naturally in place in the checklist (it would get too long)
- Clicking on checkboxes is error prone, there's no separate way to validate that something actually happened
- It's not possible to see how the plan is changing over time
- It's not possible to measure the performance/efficiency of the plan
Target audience
Release Managers and teams involved in executing releases. The main difference between this and a pipeline implementer for a .gitlab-ci.yml
is that:
- The authors of these kinds of pipelines are non-technical and even editing yaml would be a challenge
- The pipeline consists of manual and automated steps, that may require additional approvals.
Our internal customers are the #production team (for runbooks in general) and #delivery team (for release plans)
Proposal
Build a way in GitLab to create an operational runbook which allows for mixing documentation (in markdown) with executable actions, embedded in the same document. We have implemented a version of this on the ~Configure team (&380), but it has difficult setup requirements (for example, requires k8s - limiting availability of the feature): https://docs.gitlab.com/ee/user/project/clusters/runbooks/. We can leverage this feature, but it would need to be more generally available.
Sample external feature at https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-with-jupyter-style-notebook/, which is a markdown-based runbook where executable code can be embedded and run.
Note that ChatOps already supplies a mechanism for executing a script securely. Perhaps we can reuse/expand this capability for this feature.
Metrics on runbooks is not included in this iteration, but it should be possible to do things like generate a value stream map for a runbook, or show % automated/not automated and how that is progressing over time, for example. Also possible for the future are embedded approval tasks (requiring approval from specific people).
Further details
You can look at a release as a kind of state machine for deploying traditional applications. This could be done in something like Excel (which more people are still using than you might expect), or a workflow tool. You'd be able to see what the status is, which are on track, and what is failing. Typically, releases will have an overall due date and a work-back plan for delivering. Some items may depend on or block other items.
Each transition has:
- POINTER (previous transition, creating a DAG)
- HITL (wait for human to authorize)
- PRE (command to check we're ready)
- DO (command to execute)
- POST (command to check it was performed correctly)
Each task might be run:
- Automatically (no HITL/human in loop, automated) https://en.wikipedia.org/wiki/Human-in-the-loop
- Run on click (HITL, automated)
- Manual (human does something). A sub-version of this might be a pure approval step.
There are a couple ways what we build can be better than using Excel:
- Value stream analysis (how long are things taking, how much is automated, what tasks are taking a long time and could be opportunities to improve efficiency). This is what release orchestration products on the market primarily offer.
- Integrating with our releases feature and tie in with capabilities like https://gitlab.com/gitlab-org/gitlab-ce/issues/56030 (evidence collection in releases), or calls out to pipelines. This is powerful, and takes advantage of our 'single-application' nature to offer better features.
- ChatOps integration
What does success look like, and how can we measure that?
TBD
Links / references
- Original discussion: https://docs.google.com/document/d/1QCcJ4M1Wb3i474RDg4rzWc-NmZxKVr6gVwvjw3a3VcA/edit
- Live Runbooks are a similar idea and could be involved: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-with-jupyter-style-notebook/
- Video of us discussing this issue https://www.youtube.com/watch?v=ZxDQ9UhjCrU