Executable Runbooks for Releases MVC
Problem to solve
Managing releases in GitLab from a release checklist point of view is difficult. For an example of how this is done in GitLab today, you can see how we managed the 11.4 release at gitlab-org/release/tasks#462 (closed) and gitlab-org/release/tasks#460 (closed). There are manual tasks defined in the issue description in markdown, and these are discussed and checked off as things go. This is a pain for a few reasons:
- Context is minimal, detailed instructions could be linked to but aren't naturally in place in the checklist (it would get too long)
- Clicking on checkboxes is error prone, there's no separate way to validate that something actually happened
- It's not possible to see how the plan is changing over time
- It's not possible to measure the performance/efficiency of the plan
Release Managers and teams involved in executing releases. The main difference between this and a pipeline implementer for a
.gitlab-ci.yml is that:
- The authors of these kinds of pipelines are non-technical and even editing yaml would be a challenge
- The pipeline consists of manual and automated steps, that may require additional approvals.
Our internal customers are the #production team (for runbooks in general) and #delivery team (for release plans)
Build a way in GitLab to create an operational runbook which allows for mixing documentation (in markdown) with executable actions, embedded in the same document. We have implemented a version of this on the Configure team (&380), but it has difficult setup requirements (for example, requires k8s - limiting availability of the feature): https://docs.gitlab.com/ee/user/project/clusters/runbooks/. We can leverage this feature, but it would need to be more generally available.
Sample external feature at https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-with-jupyter-style-notebook/, which is a markdown-based runbook where executable code can be embedded and run.
Note that ChatOps already supplies a mechanism for executing a script securely. Perhaps we can reuse/expand this capability for this feature.
Metrics on runbooks is not included in this iteration, but it should be possible to do things like generate a value stream map for a runbook, or show % automated/not automated and how that is progressing over time, for example. Also possible for the future are embedded approval tasks (requiring approval from specific people).
You can look at a release as a kind of state machine for deploying traditional applications. This could be done in something like Excel (which more people are still using than you might expect), or a workflow tool. You'd be able to see what the status is, which are on track, and what is failing. Typically, releases will have an overall due date and a work-back plan for delivering. Some items may depend on or block other items.
Each transition has:
- POINTER (previous transition, creating a DAG)
- HITL (wait for human to authorize)
- PRE (command to check we're ready)
- DO (command to execute)
- POST (command to check it was performed correctly)
Each task might be run:
- Automatically (no HITL/human in loop, automated) https://en.wikipedia.org/wiki/Human-in-the-loop
- Run on click (HITL, automated)
- Manual (human does something). A sub-version of this might be a pure approval step.
There are a couple ways what we build can be better than using Excel:
- Value stream analysis (how long are things taking, how much is automated, what tasks are taking a long time and could be opportunities to improve efficiency). This is what release orchestration products on the market primarily offer.
- Integrating with our releases feature and tie in with capabilities like #56030 (evidence collection in releases), or calls out to pipelines. This is powerful, and takes advantage of our 'single-application' nature to offer better features.
- ChatOps integration
What does success look like, and how can we measure that?
Links / references
- Original discussion: https://docs.google.com/document/d/1QCcJ4M1Wb3i474RDg4rzWc-NmZxKVr6gVwvjw3a3VcA/edit
- Live Runbooks are a similar idea and could be involved: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-with-jupyter-style-notebook/
- Video of us discussing this issue https://www.youtube.com/watch?v=ZxDQ9UhjCrU