disabling pipeline_schedule_worker_cron in Rails configuration causes scheduled pipelines to get stuck 'inactive' with nil next_run_at
Summary
If Self-managed customers disable pipeline_schedule_worker_cron
via Rails configuration:
- All existing planned scheduled pipelines will executed.
-
next_run_at
for each schedule is set tonil
rather than a datestamp - After the the worker cron is re-enabled, these jobs no longer run, and show as
inactive
in the UI.
For any use case for disabling this worker cron where inactive schedules is desirable, then this is a 'feature'.
For example, in Disable scheduled pipeline jobs in Gitlab Globa... (#246842) - which is where this procedure comes from, one of the use cases is restores from production to test. Having the schedules go inactive automatically is useful.
One of the other use cases there is DR environments. Here the goal would be to switch all schedules on and off, and so this issue prevents this from working as hoped.
In other situations where the goal is to temporarily stop scheduled pipeline creation, scheduled pipelines get stuck in this inactive state, so it's not temporary for affected schedules. Manual steps are needed to fix the schedules.
The following use case relates to planned maintenance: job traces and CI - document how to prepare sel... (#410113)
There is a workaround.
Steps to reproduce
-
Set up GitLab with a project with a basic CI pipeline.
-
Add a schedule for the pipeline.
-
Add to
gitlab.rb
:gitlab_rails['pipeline_schedule_worker_cron'] = ""
Note: this step was based on guidance from another issue, but doesn't do what it seems to do. Read more.
-
Apply with
gitlab-ctl reconfigure
-
Wait for outstanding scheduled pipeline to execute.
-
Schedule goes
inactive
in the UI -
Remark out or remove in
gitlab.rb
:# gitlab_rails['pipeline_schedule_worker_cron'] = ""
-
Apply with
gitlab-ctl reconfigure
-
Observe that schedule does not restart.
Example Project
Cannot be reproduced on GitLab.com
What is the current bug behavior?
The worker is disabled.
This job still has a future execution planned
# select * from ci_pipeline_schedules where project_id=97;
-[ RECORD 1 ]-+---------------------------
id | 20
description | test schedule
ref | refs/tags/zd407460-tagA
cron | */5 * * * *
cron_timezone | Europe/London
next_run_at | 2023-06-09 09:01:00
project_id | 97
owner_id | 1
active | t
created_at | 2023-06-07 12:27:49.077281
updated_at | 2023-06-09 08:56:23.006074
After it executes, it shows as inactive
in the UI
However, the boolean remains active.
# select * from ci_pipeline_schedules where project_id=97;
-[ RECORD 1 ]-+---------------------------
id | 20
description | test schedule
ref | refs/tags/zd407460-tagA
cron | */5 * * * *
cron_timezone | Europe/London
next_run_at |
project_id | 97
owner_id | 1
active | t
created_at | 2023-06-07 12:27:49.077281
updated_at | 2023-06-09 09:01:03.56325
enable gitlab_rails['pipeline_schedule_worker_cron']
The schedule will remain inactive.
What is the expected correct behavior?
The pipeline schedule is only inactive as long as schedules aren't being run.
Workaround
-
Prior to disabling
pipeline_schedule_worker_cron
, investigate any schedules that are already stuck.These haven't been running: setting them inactive will mean they behave as they do at present. Any that are actually required should be re-activated by toggling active off and on.
Run this on the rails console (
sudo gitlab-rails c
) to list the affected schedules:Ci::PipelineSchedule.where(next_run_at: nil).each do | s | next if s.inactive? project=Project.find_by_id(s.project_id) puts "stuck schedule '#{s.description}' in project '#{project.full_path}'" end;nil
-
When you want to disable scheduled pipelines, set
gitlab_rails['pipeline_schedule_worker_cron'] = ""
ingitlab.rb
and rungitlab-ctl reconfigure
-
When you want to enable scheduled pipelines again, revert the previous step.
-
All schedules affected by this issue can be identified by the above Rails script.
-
The following script will calculate the next run and fix/re-activate the schedule:
Ci::PipelineSchedule.where(next_run_at: nil).each do | s | next if s.inactive? project=Project.find_by_id(s.project_id) s.schedule_next_run! puts "Fixed: schedule '#{s.description}' in project '#{project.full_path}'" end; nil
Note: this will act on all schedules where
next_run_at
isnil
. This is why it's important to remove any pre-existing schedules from this scope prior to disabling scheduled pipelines
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Reproduced on GitLab 15.10.6
Possible fixes
This is NOT a fix: As long as there's a datestamp in next_run_at
, the pipeline will schedule regardless of the state of pipeline_schedule_worker_cron
.
I tried setting it to a past date, a bit like what happens when the schedule is set not-active via the Web UI. As long as active
is true, that schedule will then qualify for execution as soon as the worker runs.