remove error prone release return-to-development-series step
The current release procedure I follow, as documented in DEVELOPERS.md, is something like this:
- Create a new branch making CHANGELOG.md changes for the release
- Wait for CI to pass
- Merge the branch
- Wait for deployment to pass on the main branch
- Create a new branch making CHANGELOG.md changes returning to the development series
- Wait for CI to pass
- Merge the branch
These steps are not all serially dependent. In particular, I usually do 4 and 5 in parallel. So the dependency graph looks something like this:
digraph {
1 -> 2;
2 -> 3;
3 -> 4;
4 -> 7;
3 -> 5;
5 -> 6;
6 -> 7;
}
Exercise for the reader: can you spot the problem with the above?
If you said either of the following, points to you.
- Deployment (4) can fail, leaving you in an awkward state
- Someone else can merge a change in-between (3) and (7)
With respect to the first of these, if there's a transient failure you can retry the deployment job. If there's a more permanent failure, you can revert the release changes and no-harm-no-foul. The commit history looks a bit weird, but the failed release was never exposed to users and the git release tag was never created.
But the second problem is more thorny. If someone merges a change in-between, it will generally pass CI and also get deployed, believing itself to be the authoritative release. So now we have two Graphviz commits each believing they are the same release. "Ah, but the second cannot succeed because git release tag creation will fail," you say. Well, yes. Except that depending on the timing of the intermediate merge, the two deployment tasks can end up racing each other. So you get one of these two commits succeeding but it is unpredictable which one. We have not encountered this scenario yet, so I'm also unsure if Gitlab's release-cli
tool anticipates this kind of concurrency and it's possible both succeed or fail.
In short, this kind of situation is a foot gun waiting to go off. The only way we avoid it right now is by all maintainers (hopefully) noticing one of the other maintainers is mid-release work flow and pausing what they're doing.
I don't have an answer of how to solve this, but we should try to make this less error prone in future.