Right now, tag and publish of Charts happen in one uncontrolled pipeline, which means we have a blind spot regarding its release. With #1412 (closed), we are proposing doing at least basic testing (check if the Chart is at least installable) of Charts before they are published to customers. This requires the tag and publish steps of Charts release to be separate, and controlled.
With gitlab-org/release-tools!645 (merged), we are bringing CNG release to be similar to omnibus-gitlab release, which means publishing is a manual job, that will be run by release-tools. We need to implement something similar to Charts too.
Proposal: Make the job that triggers charts.gitlab.io pipeline a manual one, and add code in release-tools to play it as part of publish.
@WarheadsSE Nothing really changed here. Charts release is still uncontrolled and both tagging and release happens in one pipeline. To use a chart, we still need to release it to charts.gitlab.io, so we can't "test it internally before releasing it to public". We also don't run any tests on tag pipelines unless it is too late (#1412 (closed) is open to fix this) and release pipeline is already running.
We need to split that release pipeline to two - tag phase and release phase - like how we do for CNG and omnibus-gitlab. Tag phase (to be run along with omnibus-gitlab and CNG counterparts) should tag the charts, compile them as tarballs and push it to an internal Helm repository. Then release phase, when triggered by release-tools, should release it to our public Helm repository.
With gitlab-com/gl-infra/delivery#624 (closed) we added support for auto-deploy, but we still need a private registry to be used before publishing releases. Consider this analogous to our private packagecloud repository for omnibus-gitlab packages.
Me, @WarheadsSE and @mnielsen had a call today to discuss this. There are two viable options for doing this
Use the experimental OCI support that came wit Helm v3 and use GitLab Container Registry in dev.gitlab.org as a Helm registry. As part of tagging, we push charts here, and pushing charts to public repository remains in the publish step.
Use git plugin of Helm to bypass the need of a registry entirely. Any "testing" that we may do between tagging publishing will have to make use of this plugin instead of expecting a helm repository.
The advantage of option 1 is easier path to dogfooding when gitlab-org&2281 (closed) gets implemented. Also, this is the path helm seems to be taking officially, going forward. The disadvantage is instead of simple helm repo add, for the testing phase we will have to pull the image and extract it and then treat it as a local repo.
The advantage of option 2 is bypassing a registry entirely. The disadvantage is that it is not vanilla helm, but a plugin. When we implement helm registries in the product, we may have to re-do this again.
During our call, we attempted using the OCI support and it worked well. We were able to tag v3.3.5 of charts and push it to https://gitlab.com/balasankarc/charts-container-registry/container_registry. We confirmed we could pull it back in, extract it, run helm dep update and we get back our helm repo similar to what we have in git. So, this seems to be working fine.
The first job in the Charts pipeline should wait for images to be available in dev registry. Once that is done, we can either choose to push charts to the container registry or not (depending on if we are going with Helm 3 OCI support, or helm git plugin).
As part of publish task, a manual job in the pipeline that waits for images to be available in public registry (what we currently have) should be run. This is a blocking job which will ensure the release job runs only when all images are available.
The release job remains same - it triggers a pipeline in charts.gitlab.io.
When we do #1412 (closed), we can hook it up after step 2 here.
Problems we need to fix along with this issue
Before we tag, ensure the runner version is updated: #2120 (closed)
The most difficult part of the whole ordeal? Doing that GitLab version <=> Charts version conversion twice - once for tagging and another for publishing (to know which pipeline to manually trigger).
@rspeicher Before I went and played with release-tools, I thought it might be better to get your input.
We figure out "What is the corresponding Charts version for a GitLab version" by analyzing the existing tag messages. For publish task, we need to now the Charts version so we can trigger the correct pipeline, and hence this conversion needs to be done again. Doing this twice means getting the Charts repository twice - once in tagging phase and once in publishing phase (this might increase the time taken for publishing - no other projects need this).
I see @yorickpeterse's work on release metadata tooling might be useful here, but I don't think we have that in the mix yet. So, right now the only way I can think of implementing this (without doing a massive re-write on release-tools) is doing the less efficient way of getting the repo also in publishing phase and reusing existing logic. WDYT?
@balasankarc The release metadata is already tracked, so if you have a GitLab (auto-deploy) version you should be able to determine what the charts version is. An example metadata file looks like this:
@yorickpeterse To confirm - we are not currently tracking Charts releases, are we? (I don't see them in the above JSON snippet). Also, when do these get populated? Immediately on tagging or after a release is out?
[If we add Charts to it] If it gets populated immediately after tagging, we can start using it instead of "find version from tag message" workaround we have currently. At least for the publish stage, instead of cloning the repo again, we can make use of release metadata to find the Charts version.
@balasankarc It seems so. Depending on how we run charts releases in Release Tools it should not be too hard to add the data. The data is populated during the tagging process, and uploaded after all components have been released.
@yorickpeterse So what happens right now is everything else (GitLab, omnibus-gitlab, and CNG) gets tagged (they build artifacts and push them to staging repositories and registries) and waits for us to run the publish task via chatops. When the publish task is run, all of them gets released to the public, and along with it Charts gets tagged-and-published in one go. This issue is essentially to change that, and bring in a breakpoint in the Charts release too, so that it can be tagged along with others and released along with others, in two steps.
The data is populated during the tagging process, and uploaded after all components have been released.
Could you please define where it is populated during tagging, and what "release" means? If it is available the git repo immediately after tagging, then we can probably hook it in.
@balasankarc In this case the data is uploaded as part of the release:tag task, after all components have been tagged.
Since we are also working on porting all our release code to using the API (gitlab-com/gl-infra&236 (closed)), perhaps it's best to wait for that? That way we don't implement something one way, only to replace it a week or two later.
Yeah, if the release-tools codebase is being re-worked soon (I see work has been started in that direction from the issues in that epic) I don't think waiting for that is a bad idea.
/cc @mendeni@ljlane@twk3@WarheadsSE@axil for visibility. The TLDR is hooking in the release-metadata-information-JSON to the mix can help us do this easier without adding more cruft to release-tools codebase. But, it is probably best to wait until the re-work on release-tools is finalized. This won't worsen the situation any further, but may even reduce the work in the longer run.
@WarheadsSE I don't think this issue specific had Runner's (or Runner chart's) releases under its scope, and we are discussing getting runner tagged earlier (17th of the month) in #2120 (closed) so that dependencies.io can kick in before we tag Charts on 18th/19th. Let's have that discussion there since Steve and Tomasz from Runner team is already part of that conversation.
Also, whether there is a plan to have GitLab Runner releases also through release-tools, I don't know yet. I will ask.
@balasankarc :nod: I agree. I wanted to make sure it was clear that we'd bump into that issue during the process, and technically we're dependent on something that the Omnibus/Rails is not, thus still leaving this project "different" and more complex.
@yorickpeterse before we get too far on this, how stable are we considering the api approach so far? Is it stable enough that we can abandon support for our old chart release scripts for 13.6, or does it need more bake time?
@twk3 The API release code is deemed stable, and once gitlab-org/release-tools!1219 (merged) is merged it will be enabled permanently. This means that the old scripts are no longer necessary.