Use status fields
When doing research for charts/components/gitlab-operator#17, I looked at Kubernetes workqueue and informers. I ended up with a simple short-cut solution, but based on the research I did, I propose the following.
I propose we start using the status fields to determine where we are in the upgrade process of Gitlab. This will help us with the long running jobs. We can start watching for jobs that are finished and kick-off a reconcile run. This wil help us get rid of the special timeout loop we need to check the status of a job. It will also help with charts/components/gitlab-operator#20 to retry failed migrations.
We can use these status fields to skip steps when they are already executed. A example could be the following. On a Gitlab update we go until the pre-migrations step and launch the job for that. The operator updates a status field to indicate that the pre-migrations are running and after that it finishes the reconcile run in a clean way (no retrying). After that we let the controller do the work and we react on a change of the job that the Gitlab controller created. A reconcile run gets triggered when the job status changes (eg finishes/fails). We detect in which step we are and check the status of the job. When it has a clean finish, we proceed to the next step. In the case that it failed, we retry the job and try to run the pre-migration again.
By doing this, we don't block the controller anymore to proceed with the next steps. The disadvantage will be that we will have to implement more conditionals to catch all scenarios.
Questions I myself have atm are:
- How can we stop after x times of running the (pre) migrations? Maybe we can add that to a status field?
- Make sure that the job doesn't run forever (eg stuck somewhere inside the container)?
- Is this the best-practice way?
What do you think @twk3 @dippynark