Multiple pipelines created and run for one schedule entry
Summary
When there are a lot of schedules for the same time (cron
and cron_timezone
identical) and gitlab_rails['pipeline_schedule_worker_cron'] = "* * * * *"
multiple PipelineScheduleWorker
seems to be running in parallel and create multiple pipelines for one schedule.
By default pipeline schedules created for "Every Day" are all set to run 04:00 UTC.
By default pipelines are only scheduled once per hour:
gitlab_rails['pipeline_schedule_worker_cron'] = "41 * * * *"
- However it is confusing for users when they state they want their pipeline to run at 04:00 sharp and it is hereby scheduled at 04:41.
- Furthermore this collects all pipelines of the last hour, so they start a build stampede (potentially avoidable by #17799). Maybe GitLab itself and its runners do not have a problem as the builds are queued, but adjacent systems (QA databases) might get swamped.
To avoid 1) and 2) we had set gitlab_rails['pipeline_schedule_worker_cron'] = "* * * * *"
. This leads to duplicate or in one case even 9! pipelines being scheduled because next_run_at
was not updated early enough.
Steps to reproduce
- Create a lot of schedules for the same (
cron
andcron_timezone
identical) - Set
gitlab_rails['pipeline_schedule_worker_cron'] = "* * * * *"
What is the current bug behavior?
- Multiple pipelines are created and run for the same schedule entry.
- Output for one dormant project (no commits so far this year) where only one nightly scheduled pipeline should run but up to 9! were run:
- On 2019-07-04 we upgraded from 11.11.3 to 12.0.3, this massively increased the problem.
gitlabhq_production=# select count(project_id), to_char(created_at, 'YYYY-MM-DD') as date from ci_pipelines where project_id = 836 and created_at >= '2019-01-01' group by to_char(created_at, 'YYYY-MM-DD') order by to_char(created_at, 'YYYY-MM-DD') desc;
count | date
-------+------------
1 | 2019-07-11
9 | 2019-07-10
9 | 2019-07-09
7 | 2019-07-08
8 | 2019-07-07
8 | 2019-07-06
9 | 2019-07-05
1 | 2019-07-04
1 | 2019-07-03
2 | 2019-07-02
3 | 2019-07-01
1 | 2019-06-21
1 | 2019-06-20
1 | 2019-06-19
1 | 2019-06-18
1 | 2019-06-17
1 | 2019-06-16
1 | 2019-06-15
1 | 2019-06-14
1 | 2019-06-13
1 | 2019-06-12
1 | 2019-06-11
1 | 2019-06-10
1 | 2019-06-09
1 | 2019-06-08
1 | 2019-06-07
1 | 2019-06-06
1 | 2019-06-05
1 | 2019-06-04
1 | 2019-06-03
1 | 2019-06-02
1 | 2019-06-01
1 | 2019-05-31
1 | 2019-05-30
1 | 2019-05-29
1 | 2019-05-28
What is the expected correct behavior?
Only 1 pipeline is created.
Relevant logs and/or screenshots
- Current distribution of scheduled pipelines, max value caused by fixed "Every Day" schedule
gitlabhq_production=# select count(*), cron, cron_timezone from ci_pipeline_schedules group by cron, cron_timezone order by count(*) desc limit 10;
count | cron | cron_timezone
-------+------------+---------------
122 | 0 4 * * * | UTC
38 | 0 3 * * * | UTC
16 | 0 2 * * * | UTC
14 | 0 4 * * 0 | UTC
13 | 0 4 * * * | Europe/Berlin
8 | 0 4 1 * * | Europe/Berlin
8 | 0 20 * * * | UTC
7 | 0 1 * * * | UTC
6 | 0 * * * * | UTC
6 | 0 3 * * * | UTC
(10 rows)
Entries for the dormant project, on 2019-07-11 only one pipeline was scheduled as we set gitlab_rails['pipeline_schedule_worker_cron'] = "*/5 * * * *"
For the other projects this has helped as well. It seems no duplicated pipelines ran.
gitlabhq_production=# select * from ci_pipeline_schedules where project_id = 836;
id | description | ref | cron | cron_timezone | next_run_at | project_id | owner_id | active | created_at | updated_at
-----+--------------------+--------+-----------+---------------+---------------------+------------+----------+--------+----------------------------+----------------------------
524 | Daily Deploy Build | master | 0 4 * * * | UTC | 2019-07-12 04:05:00 | 836 | 231 | t | 2019-07-01 08:43:34.596334 | 2019-07-11 04:09:07.169602
(1 row)
gitlabhq_production=# select project_id, id, created_at from ci_pipelines where project_id = 836 order by created_at desc limit 30;
project_id | id | created_at
------------+--------+----------------------------
836 | 548270 | 2019-07-11 04:09:07.516406
836 | 547264 | 2019-07-10 04:11:57.713289
836 | 547260 | 2019-07-10 04:11:47.2609
836 | 547252 | 2019-07-10 04:11:33.56558
836 | 547246 | 2019-07-10 04:11:23.475242
836 | 547239 | 2019-07-10 04:11:11.946095
836 | 547230 | 2019-07-10 04:10:59.218266
836 | 547221 | 2019-07-10 04:10:41.810501
836 | 547206 | 2019-07-10 04:10:26.629441
836 | 547193 | 2019-07-10 04:10:06.172722
836 | 545857 | 2019-07-09 04:12:17.07166
836 | 545853 | 2019-07-09 04:12:06.211545
836 | 545849 | 2019-07-09 04:11:54.928032
836 | 545843 | 2019-07-09 04:11:36.566328
836 | 545838 | 2019-07-09 04:11:26.096027
836 | 545832 | 2019-07-09 04:11:12.364199
836 | 545826 | 2019-07-09 04:10:59.021264
836 | 545819 | 2019-07-09 04:10:43.695422
836 | 545810 | 2019-07-09 04:10:27.705219
836 | 544589 | 2019-07-08 04:09:08.316839
836 | 544585 | 2019-07-08 04:08:59.775455
836 | 544581 | 2019-07-08 04:08:47.980315
836 | 544576 | 2019-07-08 04:08:36.498871
836 | 544569 | 2019-07-08 04:08:24.1138
836 | 544561 | 2019-07-08 04:08:09.176552
836 | 544559 | 2019-07-08 04:08:02.902698
836 | 543863 | 2019-07-07 04:11:04.065695
836 | 543860 | 2019-07-07 04:10:55.686923
836 | 543854 | 2019-07-07 04:10:41.050307
836 | 543850 | 2019-07-07 04:10:28.279409
(30 rows)
Output of checks
Results of GitLab environment info
- We use the docker image 12.0.3
Expand for output related to GitLab environment info
# gitlab-rake gitlab:env:infoSystem information System: Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.2 Redis Version: 3.2.12 Git Version: 2.21.0 Sidekiq Version:5.2.7 Go Version: unknown
GitLab information Version: 12.0.3 Revision: 08a51a9db93 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 9.6.11 URL: https://git.mamdev.server.lan HTTP Clone URL: https://git.mamdev.server.lan/some-group/some-project.git SSH Clone URL: ssh://git@git.mamdev.server.lan/some-group/some-project.git Using LDAP: yes Using Omniauth: yes Omniauth Providers: cas3
GitLab Shell Version: 9.3.0 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 9.3.0 ? ... OK (9.3.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... Server: ldapmain LDAP authentication... Success LDAP users with access to your GitLab server (only showing the first 100 results)
** REDACTED - showed correct looking entries **
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... no Try fixing it: sudo chown -R git /var/opt/gitlab/gitlab-rails/uploads sudo find /var/opt/gitlab/gitlab-rails/uploads -type f -exec chmod 0644 {} ; sudo find /var/opt/gitlab/gitlab-rails/uploads -type d -not -path /var/opt/gitlab/gitlab-rails/uploads -exec chmod 0700 {} ; For more information see: doc/install/installation.md in section "GitLab" Please fix the error above and rerun the checks. Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... ** REDACTED - all entries showed yes as result, we have > 4000 projects ** Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.21.0 ? ... yes (2.21.0) Git user has default SSH configuration? ... yes Active users: ... 910
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
Possible fixes
Not a fix but a workaround is to spread pipelines schedules and do a manual distribution, #17799 could help here.
Another workaround is to set gitlab_rails['pipeline_schedule_worker_cron'] = "*/5 * * * *"
, however this leads to an IMO unexpected behavior for users.
At least document that setting pipeline_schedule_worker_cron to run every minute is dangerous, this could be done in the comment above the entry!