More logic when setting up pipeline schedules

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem to solve

As the world changes around us, sometimes our code needs updating. To avoid discovering this kind of problem when we're trying to update the code for other reasons, we would like the test pipelines (for at least 140 projects) to be re-run regularly, (approximately) weekly intervals would be good.

We can get a bit of the way with scheduled pipelines, but there are problems with this approach:

  1. Scheduling at night
  2. Scheduling at different times
  3. Avoid wasted runs
  4. Who gets reports

(see below for details)

Intended users

Further details

Scheduling at night

With the current scheduling features, it default to run pipelines at night, this should still be a feature when more logic is added.

Scheduling at different times

To avoid load spikes (We have an on-premise EE installation with our own runners, and security considerations mean I won't even consider anything "cloudy" - "There is no cloud just other people's computers") it would be preferable to run pipelines for different projects at different times (perhaps different days of the week).

Doing this we will probably run into the issue described in #37422 (closed) (and plenty others) that there is some non-obvious logic involved in scheduling in GitLab, but it seems like that could be worked around - but this is another reason to improve it.

It also means that we will need some kind of coordination between our projects, I can't just tell all our developers that they can set up a schedule, I'll need to find a time where the runner has time.

Avoid wasted runs

When someone has committed to a project, the pipeline will have run, and there probably won't be a reason to re-run it for some time. This is similar to what @jgabriels said in #18868 (comment 215021350).

Who gets reports

If I need to find a time where the pipeline for each project can run, it seems obvious that I could just set up all the schedules. But then I become the owner of all the pipelines, and (as far as I can see from - limited - testing) receive the report saying where it passed or failed. There will undoubtedly be issues that I'm simply not capable of fixing, and while I might be able to understand some of the other issues those reports are probably also better directed at those developing the affected code. And as we're talking automated runs here, any one developer of any project might be on vacation or something, so the report should probably go to more than one person (we have mailing lists/aliases to handle if we can be allowed to set a recipient that is not a GitLab user). (In our case I could abuse my powers as instance admin to impersonate the person I believe should own the pipeline, but that wouldn't solve the vacation issue.)

My idea for implementing this (but anyone actually working on it, can do it however they feel like), is:

  • to have a queue (perhaps one per runner makes it easier) of pipelines that should be executed when there is time, and the earliest time that pipeline can run, and then have a background task picking jobs from this queue when the load on the runners are sufficiently low/between given times/...
  • to dequeue jobs when the pipeline for a project starts
  • to enqueue a job with an appropriately set (configurable per project) earliest (re)run-time when a pipelines finishes
  • to have an additional e-mail address of who to notify about the result of automatic pipelines.
Edited by 🤖 GitLab Bot 🤖