Proposal for reordering release pipelines

Current state

To finalize the release currently GitLab CE/EE are considered the source of truth for all versions of its components.

When release is ready for tagging:

release-tools project will create the tag in CE/EE projects
read the *_VERSION files and commit to omnibus-gitlab
tag omnibus-gitlab which then builds all required packages
Deploy to staging when the packages are ready
Manual deployment to production canary and production
Packages are then manually promoted to public
After CE/EE tagging is done on gitlab.com, build is triggered in CNG which will build docker images for the Helm charts
Helm charts are released after the docker images are built

This whole process is a result of growth and iteration since the start of GitLab projects. When GitLab started it had only one dependent component and releasing was as simple as tagging in CE.

With the addition of more components, new packaging methods and deployments to GitLab.com, this process became difficult to manage to the point where we need to find a solution that would be easier to manage and follow.

Proposal for intermediate step

Deploy to gitlab.com is currently the last step of the process, and building components is the first one. While that is logical I think it should be the other way around. High level overview would look something like this:

graph LR
A[Deploy] -->|Ask for deploy artifact| B{Release tooling}
B -->|Artifacts are ready | A
B --> C[Package]
C --> B
B --> D[Charts]
D -->B
C -->|Tag| E[CE]
C -->|Tag| F[EE]
C --> G[gitaly]
C --> H[workhorse]
C --> I[gitlab-shell]
D --> G
D --> H
D --> I
D --> F
D --> E

Deploy project initiates a deploy
It queries the release project that contains all knowledge of versions of different components
Release project initiates builds for each of the required components and waits for the result
When the artifact is ready, release project informs deploy project that artifacts are ready

Change above means that no single CE/EE commit is responsible for triggering a deploy. This is a problem but only if you consider CE/EE to be the same monolithic codebase that we had/currently have. This is becoming less and less true, with multiple components that are required for GitLab to function, Rails application should be considered as just one service. This means that it should no longer be the starting point of the process.

Deploys to gitlab.com require multiple steps which include deployments to several environments, various QA tasks, monitoring of endpoints and similar. Due to this, the deployment environment needs to be the one allowing new versions into its systems because that environment knows its current state. It is also important to have a way to find out about version compatibility between different versions, so having a version compatibility matrix in one project should help with that.

Proposal for end result

Once we are ready to no longer deploy from a single package and we can build and deploy different components independently, we might end up with something like this:

graph LR
A[Deploy] -->|Ask for deploy artifact| B{Release tooling}
B -->|Artifacts are ready | A
B -->|Tag| E[CE]
B -->|Tag| F[EE]
B --> G[gitaly]
B --> H[workhorse]
B --> I[gitlab-shell]
E --> C[Package]
E --> D[Charts]
F -->C
F-->D
G  --> C
G --> D
H --> C
H --> D
I --> C
I --> D

The example above would allow us to:

query the release project at each deploy
diff the deployed versions with desired versions
Build any component that is not currently built
Deploy only components whose versions are not currently deployed

Case 1: Continuous deploys

The changes above would allow us to prepare and harden the tooling required for deploy and release. Whether we deploy with the package or separate charts is an implementation detail, but the flow should be established. As an example:

Intermediate step

Deploy project has a deployment schedule. Release project has a version compatibility definition for deploys from master, eg. EE should be built from master, gitaly from master, gitlab-shell from stable and so on. Once the artifacts have been built, deploy can proceed. This means that the release project becomes a source of truth for any version changes so if you want to build a package that contains a specific version of gitlab-shell, you just need to define that in the release project and that change will be built and deployed.

End result

Deploy project has a deployment schedule. Release project has a version compatibility definitions and triggers image builds for each component that is not built already. Once all components are ready, deploy project is informed that the deploy can proceed. Deploy project can then deploy individual components based on the data that release project provided it.

Case 2: Public releases

In the case of intermediate step proposal, nothing really changes for the public releases from the point of the end user.

Internally however, we can detach deployments from releases. We have one project with all versions listed so whether we trigger a release from deploy project, or manually when we are ready to cut a release it does not matter. This means that we can decide to deploy a final public release regardless of the CD pipeline or just create a public release.

The bigger change comes from the end result proposal. Because we have all components built and deployed first through CD, we can just create a schedule for public releases and create the final release from the artifacts we already used.

FAQ

Q: How is this different than what we already do?
- A: The release flow is mostly manual and goes through multiple projects which makes creating an overview nearly impossible. By standardizing on one project as the source of truth it becomes easier to have a birds eye overview of the process, and also automate most of the manual tasks
Q: Can we just use Kubernetes to deploy and be done with this?
- A: Kubernetes resolves a different problem, it allows for simpler scaling once the deployment has occurred. The proposal above is trying to abstract and standardize what happens before the deploy. Once that is standardized, whether deployment occurs on VMs or Kubernetes it does not matter from the point of this discussion.