GitOps sync should wait for the related CI pipeline to run green

added to epic &3329

changed the description

It feels to me that the GitOps sync is already an automation that we're doing in favor of something that could have been done by a .gitlab-ci.yml, like a kubectl apply command for instance. Also, without being directly associated with a pipeline, the user can use this feature without spending any CI minutes. So that's something to consider if we really want to tie the GitOps sync feature with pipelines.

That being said, if the GitOps sync fails, we should signal GitLab somehow and a CI pipeline does seem to be the most e natural way. But would it make sense to have other ways? Some kind of badge in the main project's page?
But I guess this would only be able to notify failure of synchronization, not specific deployment check, like a pod properly accessible via an Ingress.

If we decide to tie it to the pipeline, I think there could be some kind of API that requests KAS to start the sync. This way the user could configure a script to trigger the sync by themselves, which would also reduce pooling pressure on Gitaly, then they could script their own tests to check if the deploy was successful and decide whether to exit 1 their pipeline.

We could go one step further, and make GitOps a special kind of a trigger job that requires no script. For example:

CI YAML:

# my-group/app/.gitlab-ci.yml
trigger_gitops:
  stage: production
  environment: production
  trigger:
    gitops: my-group/agents:some-agent

Agent config file:

# my-group/agents/.gitlab/agents/some-agent/config.yaml
gitops:
  manifest_projects:
    - id: my-group/app
      when: trigger

The when: trigger is just to demonstrate the point. I think we will want more flexibility than when: trigger and need to be mindful of similarities / overlap with &4516 (closed).

This could be done using release tags/branches. Like how we deploy things with semantic-release in many places:

commit to manifest project, runs CI
one of the jobs in CI creates the release tag
kas picks up the release tag

Edit: What I mean to say is that the use case

Run a set of CI jobs on the final manifests before kas gets to deploy them

can be achieved without introducing a new feature.

This is an interesting question. Perhaps we should wait for a customer to tell us they need it and present a use case?

Edit: my instinct is to avoid coupling things together unless it's a clear win.

@nagyv-gitlab @wleidheiser, feels like this issue didn't get much attention from GitLab users in the last year.

Do you know if we've collected enough customer feedback on our interviews to get an idea of the preferred approach to solve this problem, and whether this is a problem to be solved at all? Or should we keep this issue opened a little longer and perhaps even start a more focused research with them?

@Alexand

Do you know if we've collected enough customer feedback on our interviews to get an idea of the preferred approach to solve this problem, and whether this is a problem to be solved at all?

The research I've done hasn't focused on this particular topic. After reading through the issue, I don't have a full understanding of what the problem is from a users' perspective. Is the problem that users expect the Kubernetes agent to be integrated into the pipeline? Do users want more flexibility in how they can use our feature set? I think I need some help understanding the underlying problem.

Thanks, @wleidheiser.

Currently our GitOps offering works independently from our CI/CD pipelines. In other words, the application is deployed without executing any CI/CD jobs. The code simply gets synced and updated in Kubernetes, once the code gets merged to the default branch.

The proposal of this issue, is to evaluate whether we should implement a solution which would tie the synchronization of the GitOps manifest files, or Helm charts, with a CI/CD pipeline that's been run successfully to completion. This would aid automated tests, automated security checks, etc, to be executed before a deploy is executed. If the tests or security checks fail, GitLab wouldn't sync the GitOps manifest files, so the deploy would be intentionally automatically blocked.

Please, let me know if the scope is still cloudy.

There's also a question on what/how we would trigger the deployment. It could be a specific .gitlab-ci.yml key in a CI/CD job. It could be triggered by adding git tags being added, and probably there are other alternatives.

@Alexand - I appreciate you clarifying the topic and questions for me. That really helps. I asked some of the other product designers who work within Ops and they haven't gotten any feedback along this particular topic. @nagyv-gitlab - Have you heard users discuss this topic in your continuous interviews?

It seems like there could be problem validation research needed to understand how much of a problem it is to have the GitOps offering independent from our CI/CD pipelines. If it is a problem we should address, then it would make sense to evaluate any potential solution(s).

@Alexand @wleidheiser A specific use case is described below by a user. After conducting minimal research on the topic back in 2020, I did not focus on it in my interviews. My research in 2020 included GitLab SREs and a few people in a CNCF call.

Am I understanding correctly that the question is not the validity of the use case but the approach to take to solve it? Is expanding the CI syntax needed, or is waiting for a pipeline on the configured git ref to run green enough?

Thanks, @nagyv-gitlab.

For me, it's clear that this brings value. I was wondering the two questions below:

Since we didn't have much interaction in this issue in an year time, do we know many people want this? How much relevant is this?
What are the preferred approaches to trigger the GitOps synchronization after the pipeline succeeds?

Ok. I'll ask around. I see the following options

# my-group/agents/.gitlab/agents/some-agent/config.yaml
gitops:
  manifest_projects:
    - id: my-group/app
      when: on_commit | on_pipeline_success | on_trigger

on_commit: the current behaviour
on_pipeline_success: no special YAML is needed, when the pipeline run green, the GitOps deployment starts
on_trigger: triggered as a job in the pipeline as described in #288307 (comment 1163338950)

Maybe also on_tag and on_semver similar to Flux's approaches?

Hi @tkuah,

Please add labels to your issue, this aids categorization and locating issues in the future.

Thanks for your help!

You are welcome to help improve this comment.

added auto updated label

added Category:Kubernetes Management featureenhancement groupconfigure [DEPRECATED] labels and removed auto updated label

added typefeature label

Setting label(s) ~"devops::configure" sectionops based on ~"group::configure".

added devopsconfigure [DEPRECATED] sectionops labels

mentioned in issue gitlab-org/quality/triage-reports#1044 (closed)

I agree with @ash2k here. There are many possible user flows, and we don't know the preferred one, yet.

The simplest setup is that the user runs their CI, and the CI updates the manifests. If the CI failed, the manifest did not get updated. Problem solved.

For Hordur's idea, I think we would need to add support for &4516 (closed)

Re what Joao said

That being said, if the GitOps sync fails, we should signal GitLab somehow and a CI pipeline does seem to be the most e natural way. But would it make sense to have other ways? Some kind of badge in the main project's page?
But I guess this would only be able to notify failure of synchronization, not specific deployment check, like a pod properly accessible via an Ingress.

We have this issue on the roadmap: #258603

changed milestone to %Backlog

added awaiting feedback workflowvalidation backlog labels

added [deprecated] Accepting merge requests label

I agree that we need to provide flexibility to our users. Some will prefer a liberal pull-based gitops model, others wont. The more flexibility we can provide the better. Likely there are many possible user flows and we shouldnt be dogmatic or prescriptive.

I think providing the option for a pull-based sync, after a CI run is a good config option. Many will likely utilize it.

One scenario we are envisioning is the ability to run a SAST/policy tool on the manifest repo, and ONLY if that tool passes, allow the sync to occur.

changed epic to &5597 (closed)

mentioned in issue #342696

marked this issue as related to #342696

marked this issue as related to #276248

marked this issue as related to #363579 (closed)

mentioned in issue #363579 (closed)

changed the description

added direction documentation labels

changed the description

removed awaiting feedback label

added GitLab Free GitLab Premium GitLab Ultimate labels

added workflowrefinement label and removed workflowvalidation backlog label

changed epic to &5567

set weight to 5

removed [deprecated] Accepting merge requests label

@brett_jacobson @olastor @Z01d-b3rg May I invite you to share your insights and details about your process while you this issue? We want to learn more about you and your related problems to ensure we can provide the best possible solution.

If you are open to a call with me, I would love to learn more about your Kubernetes-related processes and use cases. You can reach me via e-mail from my GitLab profile to schedule a call.

mentioned in issue Alexand/growth-and-development#1 (closed)

changed title from Should GitOps wait for CI ? to GitOps sync should wait for the related CI pipeline to run green

added workflowproblem validation label and removed workflowrefinement label

mentioned in issue gitlab-com/Product#4964 (closed)

added groupenvironments label and removed groupconfigure [DEPRECATED] label

added devopsdeploy label and removed devopsconfigure [DEPRECATED] label

mentioned in issue #405007

marked this issue as related to #405007

marked this issue as related to #368189

Now that GitOps is done by Flux, with support for OCI artifact deployments (instead of only git repos), is this issue fixed automatically?

I mean: if you need to deploy after the pipeline is done, then use OCI mode and publish that artifact only after a successful pipeline. Problem solved. Actually, that's the current recommendation

Also, for git repositories, there's support for immediate reconciliation.

So it seems to me like this issue is already fixed.

@yajoman You are correct. We recorded a demo that shows various aspects of doing GitOps with GitLab at https://gitlab.highspot.com/viewer/655dfb4c24c5772b0252d9af?track=false

Still, we have plans to improve this setup so that you won't need to consume runner minutes while the reconciliation is running. This direction is tracked in External CI jobs MVC (&10866)

I'll close this issue as solved. Thank you!

GitOps sync should wait for the related CI pipeline to run green

Release Notes

Problem to solve

Proposal

Intended users

Feature Usage Metrics

Designs

Child items ...

Activity

GitOps sync should wait for the related CI pipeline to run green

Release Notes

Problem to solve

Proposal

Intended users

Feature Usage Metrics

Relates to

Activity