Run CI/CD pipelines on a schedule - Implement Cron project hash and special intervals
Problem to solve
- With "Scheduled Pipeline runs" as implemented e.g. in gitlab-ce#30882 and proposed firstly in gitlab-ce#2989 the load on shared runners could peak at certain times.
- A lot of projects need at least daily builds for integration tests but
something like
0 0 * * *
would lead to a stampede of pipelines at midnight when naively manually entered for a lot of projects. - The current implementation (as of GitLab 12.0.1) replaces selection "daily" with 04:00 UTC and does not only cause a stampede but indirectly even causes unwanted starts of multiple pipelines for one schedule (gitlab-ce#61141).
- Additionally a failing dev pipeline on week ends will most of the time only lead to an email notification without anyone responding to it.
- Other CI systems like Jenkins have specific syntax to avoid this.
Proposal
- Implement project hash: when
H
in minute or hour is entered, hash "group name/project name" and distribute the execution accordingly over 60 for minutes resp. 24 for hours. - This will ensure, that when all projects entered
H H * * *
, the pipeline of a project would run predictably at the same time of day but distributed over 24*60 slots for all projects. - Add a special identifier
@daily
which would just mean somethingH H * * *
. - Add a special identifier
@weekdaily
which would just mean somethingH H * * 1-5
. - Add a special identifier
@nightly
which would just meanH H(19-23),H(0-6) * * *
, so the pipeline would run "after office hours", i.e. from 19:00 to 06:59. - Add a special identifier
@off_hours
which would just mean somethingH H(19-23),H(0-6) * * 1-5
. - Implement
H
at least for minutes and hours, for day/month/weekday I do not see a use case.
Unclear:
- Are
@daily
and@nightly
good identifiers (people could expect execution at midnight because some cron implementations use it like this), maybe once_per_day or once_per_night would be better and randomized relative based on the user who owns the jobs location/time zone?
Links / references
- Sample implementation for puppet https://github.com/pradels/puppet-parser/blob/master/lib/puppet/puppet/parser/functions/fqdn_rand.rb
- Jenkins help page: https://github.com/jenkinsci/jenkins/blob/master/core/src/main/resources/hudson/triggers/TimerTrigger/help-spec.html
- Deterministic "random" numbers for Ansible: https://gist.github.com/ptman/9bd8223272e2c0e27b2b
Documentation blurb
- To avoid a stampede of daily pipelines running at midnight, use
H H * * *
as expression (the same is achieved by entering@daily
). -
H
will be replaced by a number in the range 0-59 for minutes and 0-23 for hours automatically. The number is evenly distributed based on the hash of "group name/project name". - Entering
H H(19-23),H(0-6) * * 1-5
would run your pipeline Monday to Friday during "after office hours", i.e. between 19:00 and 06:59 (the same is achieved by entering@off_hours
). - Note that
H
is only supported for minutes and hours.
Edited by James Heimbuck