WIP: Resolve "Run CI/CD pipelines on a schedule"
This MR is only for planning milestones for Idea 1 in #2989 (closed) and clarifying the specification. Each step should be separated MR.
Step 1: Remove legacy codes
-
Check whether legacy codes are still lurking (e.g. whenever
gem) -
Remove legacy codes. Test to make sure the change doesn't break current architectures and dependencies.
Step 2: Backend
Step 2.1: Database
-
or new table
-
Plan to expand schema -
Add columns and migration
table | column | type | data |
---|---|---|---|
(TBD) | trigger_type | string |
external (api) or scheduled (cron) |
(TBD) | cron | string | e.g. 30 18 * * *
|
(TBD) | cron_time_zone | string | e.g. Europe/Istanbul
|
(TBD) | target_ref | string | e.g. complie-linux-dist-*
|
(TBD) | condition_type | string |
always or if_changed
|
(TBD) | active | boolean | literaly pasue/unpause |
Step 2.2: Controller
-
Create a new trigger with cronjob (#new, #create, with sidekiq-cron) (TODO: Elaborate this process) -
Edit a trigger (#edit, #update) -
Remove a trigger (#destroy) -
Invoke a trigger with cronjob immediately (#test_cronjob) -
Support Pass job variables
(TODO: make sure this needs to be classified)
Step 3: Frontend
Step 3.1: Registration of a new Trigger ("Settings" -> "CI/CD Pipelines" -> "Triggers")
-
Click Add trigger
button -> Show a new trigger registration form -
"Trigger description" (Already existed) -
"Trigger type" (Radiobutton: External Trigger(API) or Scheduled Trigger(Cron)) -
if "Scheduled Trigger" chosen, expand those items -
"Schedules" (TextFiled: e.g. 30 18 * * *
. Syntax check.) . Plus there are three buttons: "Nightly builds", "Sunday night", "Last day of a month". If one of them clicked, automatically this filed fulfilled. -
"Time zone" (Combobox: For gitlab.com users. Choose a country. e.g. Europe/Istanbul
) -
"Target ref" (TextFiled: wildcard(*) support. e.g. complie-linux-dist-*
. If there are no matches, show an error msg.) -
"Conditions"(Combobox: "always" or "if there was a new change on the branch". Default: "always") -
"variables"(TextFiled: e.g. "variables[RUN_NIGHTLY_BUILD]=true") -
Active/Deactive a trigger (only #edit
)
Example
Reference: Buddy
Step 3.2: List triggers
In a row, there are
-
Status -
Active/Deactive -
Cron format -
Last used (Already existed) -
Link to the last invoked pipeline -
Button -
Delete a trigger (Already existed.) -
Edit a trigger (Already existed. Edit parameters except "Trigger type")
Example
Reference: TravisCI
Note / Concerns
- Limitation for gitlab.com. e.g. One project has only one Scheduled Trigger. User can not set less than one day interval. Target ref should be matched less than 5 targets.
- Performance. Processes of schedulers and builds will incredibly increase.
- No real-time status update like Buddy for a first iteration.
- Maybe this layout should be refurbished by UI/UX dev.
Merge request reports
Activity
Some specs are still under investigation or discussion on #2989 (closed).
mentioned in issue pages/nikola#2
Which is better for handling sidekiq-worker and scheduled-triggers?
Idea 1: One sidekiq-cron worker manages one scheduled trigger
Everytime when scheduled trigger registered, create sidekiq-cron worker(
Sidekiq::Cron::Job.create
) simultaneously.fig.1
Idea 2: One sidekiq-cron worker manages all scheduled triggers
Create only one sidekiq-cron worker, which
performs
periodically per 5min. Each time ofperform
executed, compare a current time to each schedule. If matched(or near), process the scheduled trigger.fig.2
New background architecture (Draft): https://cacoo.com/diagrams/YkzHGiMGObhQlgXG-962E2.png
@dosuken123 I think that the plan is to simplify implementation of pipeline triggering, which might involve removing classes / moving code around, but since this is not done, and we still have
Ci::TriggerRequest
we can use it. When time comes for the refactoring we will need to take scheduled pipelines into account.We should not touch nor extend
ci_trigger_request
.This is how I see it:
- We allow defining multiple triggers per-project,
- The ref is not the wildcard, it is exact match,
- We do support only
always
, - We do not trigger new pipeline if the previous pipeline is still running from trigger,
- We do not expose
active
, it can be added later, - We store in DB:
next_run_at
that will be calculated fromcronjob
and stored by worker that executes pipeline, - The
next_run_at
needs to be not less than 1h from now, - Ignore
cron_time_zone
for now and assume that everything is in time zone of user that created a trigger,
DB structure (mostly aligned with your proposal):
ci_triggers: trigger_type: external / scheduled # we add a migration that adds a default column with `external` ref: `ref/master` # not needed for external, if specified for external it is verified against the external trigger, if is mandatory for scheduled cron: "8 * * * *" next_run_at: "date/time" # column with index
Worker
We add a new cron job worker, that will look like this:
class StuckCiJobsWorker include Sidekiq::Worker include CronjobQueue def perform return unless try_obtain_lease Ci::Trigger.scheduled.where("next_run_at < ?", Time.now).find_each do |trigger| begin Ci::CreateTriggerRequestService(trigger.project, trigger, trigger.ref) rescue => e Rails.logger.error "#{trigger.id}: Failed to trigger job: #{e.message}" ensure trigger.schedule_next_run! end end end end
We update
Ci::CreateTriggerRequestService
with:def execute(project, trigger, ref, variables = nil) # we need to find pipeline for that ref and that trigger, ignore if it's running trigger_request = trigger.trigger_requests.create(variables: variables) pipeline = Ci::CreatePipelineService.new(project, trigger.owner, ref: ref). execute(ignore_skip_ci: true, trigger_request: trigger_request) if pipeline.persisted? trigger_request end end
How I see the steps
- Add DB changes, add new cronjob worker, prepare all backend, on trigger list show only
external
triggers for now, - Add API for scheduled triggers,
- Add new UI for triggers: 1. remove the token, 2. show
external
orscheduled
, 3. show Last Run and Next Run, 4. do not show cron specification. Cron job definition and the token will be shown when you go toEdit
. We can consider giving a button to copy token if needed. This is for @dimitrieh to figure out.
Edited by Kamil TrzcińskiFor 9.1 we should be able to do 1., maybe, but unlikely 2., 3. seems to be impossible.
Edited by Kamil Trzcińskimentioned in issue #2989 (closed)
mentioned in merge request !10133 (merged)
@ayufan I have a question.
- Ignore
cron_time_zone
for now and assume that everything is in time zone of user that created a trigger,
I can understand if
users
table has a columntime_zone
or something, but it seems that stilltime_zone
is not persisted in Gitlab database. If I look uptime_zone
inconfig/gitlab.yml
, it doesn't work on gitlab.com. So I'm still thinking we need to persist the data in database, otherwise a new worker can't calculatenext_run_at
repeatedly.- Ignore
@ayufan About where we persist those data. I was thinking that extending ci_triggers for STI would be a good idea, but scheduled triggers data are not so similar with
ci_triggers
and the data keep growing in the future(e.g.active
,variables
,condition
, etc), So I recommend creating another table such asci_scheduled_triggers
, What do you think?About external triggers.
curl --request POST \ --form token=TOKEN \ --form ref=master \ --form "variables[UPLOAD_TO_S3]=true" \ https://gitlab.example.com/api/v4/projects/9/trigger/pipeline
(from Exmpale)
active
,condition
andvariables
can be handled by users.active
= Start/Stop calling the API,condition
= Combining with pipeline API,variables
= Post parameter "variables[blah]". I can't imagine that storing those data would be useful. Rather, centralizing parameters at only here seems to be meaningful.Whereas, scheduled triggers are handled by sidekiq-cron. So user needs to teach sidekiq-cron those information.
Currently, I'm implementing with those structures.
ci_scheduled_triggers(column) ci_triggers(column) type project_id project_id integer deleted_at deleted_at datetime created_at created_at datetime updated_at updated_at datetime owner_id owner_id integer description description string (No need) token string cron (No need) string cron_time_zone (No need) string next_run_at (No need) datetime last_run_at (No need) datetime ref (No need) string I had tried a lot to merge this into one table, but technically those two kinds of triggers are quite different, so I'd suggest to have separated tables.
@godfat Sorry for bothering you
Could you also give me an advice? Is it better to be STI or separated tables?@dosuken123 Feel free to ping me! :) Sorry I am not following this closely, but reading through above few comments, I think they're quite different so we should use a separate table. Also, I feel that they should not be called scheduled triggers, because they could be recurring, and in that case they don't look like triggers.
On the other hand, from my past experience, polymorphic association is definitely a bad thing, and STI with more than one non-shared columns is often introducing problems in the future. While it's attracting (well, because we would have less repetition) but just like inheritance in OO, they're too powerful that we need to be very sure that we want to use it, otherwise it would not justify for the cost. (larger table is also harder to scale)
@godfat Thank you for an advice! You seem to have much knowledge of database structure, so I wanted to ask you about this. While I was considering this structure, I read an article that it's bad to use STI if I add new multiple non-shared columns.
Reference: https://devblast.com/b/single-table-inheritance-with-rails-4-part-1
STI should be used if your submodels will share the same attributes but need different behavior. If you plan to add 10 columns only used by one submodel, using different tables might be a better solution
While it's attracting (well, because we would have less repetition) but just like inheritance in OO
This is out of topic, but I just know the database system. Object-oriented.
Now I'm reading this and this. I had been a long while in C#.NET, but never used that, but kinda interesting to store objects directly.Also, I feel that they should not be called scheduled triggers, because they could be recurring, and in that case they don't look like triggers.
Yes. I'm feeling this too. This feature has a name of scheduled "trigger", but it's far from original "trigger"(API base) feature.
@dosuken123 Since I am now more leaning toward functional programming, and I know much more about RDBMS now, I no longer have any interests in OODBMS :P I don't feel that's the way to go.
I could see where the name triggers is coming from, and I also understand that they indeed share some common concepts. However I am still a bit worried that if we use the same name, we could get confused in the future while one involved over the other.
Actually I also feel that the name trigger is a bit confusing... What about
PipelineSchedule
? It would be clear that it's a schedule for generating pipelines.@godfat In addition, at some points, it also makes sense even If we call it "scheduled trigger", As I summarized at here, basically a scheduling feature is started from API implementation (crontab+API). This is slightly lame for operators who are not good at
crontab -e
, but at least it becomes possible of scheduling. And later, some of CI system evolved into having built-in scheduling feature. They allow users to customize it without manipulating terminal with rich GUI.Now Gitlab has API named "trigger", if it evolved, it should inherit the name "trigger", I think. But the problem is... those implementations will be quite different. So that If we release the service as "scheduled trigger" and implement as
PipelineSchedule
may make sense or cause another chaos...@dosuken123 That's really an awesome summary. I could see that "scheduled triggers" would be a natural evolution, so I am not really against it. However I think I would still prefer to have "pipeline schedules" as the name, together with "pipeline triggers". So "pipeline schedules" are schedules which could possibly create a pipeline on schedule, and "pipeline triggers" are triggers which we could pull anytime to create a pipeline. I don't think this would be confusing. (and of course, the feature and implementation should have the same name)
Edited by Lin Jen-Shin@dosuken123
Maybe, but I also see that we will have for external triggers. We would use
ref
to limit what you can execute.- ref - last_run_at
But also having two separate objects makes it a little harder to annotate
ci_pipelines
, as we would have totrigger_id
andscheduled_id
. So pretty much we would something extra likecron
andcron_time_zone
.Edited by Kamil Trzciński@ayufan OK. Then let me create another MR using STI. Aside from an MR !10133 (merged) using separated tables.
mentioned in merge request !10510 (closed)