Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • See what's new at GitLab
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab FOSS
GitLab FOSS
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 5
    • Merge Requests 5
  • Requirements
    • Requirements
    • List
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
  • GitLab.org
  • GitLab FOSSGitLab FOSS
  • Merge Requests
  • !21767

Merged
Opened Sep 17, 2018 by Shinya Maeda@shinya.maeda🌴Maintainer24 of 24 tasks completed24/24 tasks

Delayed jobs

  • Overview 350
  • Commits 122
  • Pipelines 90
  • Changes 89

What does this MR do?

This MR is to implement the new feature - Delayed job.

  • CE MR: here
  • EE MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/7761

Basic concept

  • When when: delayed and start_in: X sec/min/hour are specified to a job in .gitlab-ci.yml, the job will start running in X sec/min/hour, instead of running immediately.
  • Delayed job can be unscheduled via job's unschedule button. "Unschedule" means that the delayed job will never be executed in the future, and users can still play it manually.
  • Delayed job can be canceled via a pipeline cancel button or a job cancel button. "Cancel" means that the delayed job will never be executed in the future, and users can still retry it manually.
  • Delayed job can be triggered immediately. The button exists in the job index page.
  • Timer of Delayed jobs will start ticking right after the previous stage has finished
  • Delayed job blocks the current stage by default. For example, if a job is delayed to run at 1 hour later, the current stage will have been blocked for 1 hour.
  • Manual job blocks a stage with allow_failure: true, whereas delayed job blocks stage regardless of the allow_failure value.
  • One delayed job will be delayed only once. For example, if a delayed job is canceled or unscheduled, users can only retry or play. In this case, the job is fired immediately.
  • UI/UX requirements are described in the issue description

State transition of scheduled

Today we have 8 core statues for ci_builds.status - created pending running success failed canceled skipped manual.

And this MR adds a new status scheduled to ci_builds.status. The state will transit as the following,

  • All job's status starts from created
  • When a job is scheduled to run in the future, the created status transits to scheduled, via ProcessPipelineService. BuildScheduleWorker (sidekiq-worker) is scheduled to work at the right time.
  • When the right time has come for a scheduled job, the scheduled status transits to pending, via RunScheduledBuildService. During this process, BuildScheduleWorker worker checks if the scheduled job is still playable, at first.
  • When a user plays the scheduled job immediately, the scheduled status transits to pending, via PlayBuildService.
  • When a scheduled job is unscheduled during scheduled status, the scheduled status transits to manual. In this case, BuildScheduleWorker(Scheduled sidekiq-jobs) will not proceed to RunScheduledBuildService.
  • When a scheduled job is unscheduled during non-scheduled status, the system raises an exception.
  • When a scheduled job is canceled during scheduled status, the scheduled status transits to cancel. In this case, BuildScheduleWorker(Scheduled sidekiq-jobs) will not proceed to RunScheduledBuildService.
  • scheduled state transition is irreversible. It transits to pending, however, all status (except created) can not transit back to scheduled

Concerns

  • Pipeline/Build status is tightly coupled with BE (e.g. Gitlab::Ci::Status::Build::Scheduled will directly reflect the frontend components). Can we create a dynamic component for a specific state (i.e. scheduled jobs) ? => We'll follow up
  • What's the compound status of a stage? (e.g. Job A: running, Job B: pending, Job C: scheduled => What is shown on pipeline-mini-graph?)
  • What if sidekiq-jobs are lost? https://gitlab.com/gitlab-org/gitlab-ce/issues/36791. Do we just leave it? or do we introduce a clean-up worker? => We cleanup stale schedules in StuckCiJobsWorker

Performance implication

In this MR, we add scheduled_at column to ci_builds table. This column is UPDATEd when the build is scheduled (To set the date), and the column is UPDATEd when the scheduled build finished (To clear the date). Both are queried during the status transition (e.g. UPDATE ci_builds SET status = scheduled WHERE id = 100), therefore there are no additional queried will be executed in the life cycle (e.g. UPDATE ci_builds SET status = scheduled, scheduled_at = '2018-09-24 10:06:19.385977' WHERE id = 100).

However, due to the Sidekiq reliability problem, we can't assure that all scheduled jobs will be executed 100%. A few jobs might be stuck at the scheduled state, in cases corresponding BuildScheduleWorker queue has been lost by SIGKILL.

To rescue those potential orphans, we're going to add a cleanup phase for stale scheduled jobs. This operation is included in StuckCiJobsWorker as it's meant to handle stale pending/running builds. In order to find stale scheduled builds, the worker executes Select * from ci_builds where scheduled_at IS NOT NULL && scheduled_at < '1 day ago'. Given ci_builds table is a very big table (At the moment, it contains over 100 million rows), we add a partial index on (scheduled_at, id) columns where scheduled_at IS NOT NULL. This would make this operation much faster as it uses Index Scan at the first step, and expensive date comparison will perform to only small subsets.

Feature flag

This feature is behind the feature flag ci_enable_scheduled_build. So that if something wrong with this implementation, we can minimize the impact by disabling the feature flag.

When ci_enable_scheduled_build is disabled, delayed jobs will not be created even if gitlab-ci.yml has when: delayed. Instead, it's simply translated to manual job.

Here is how to manipulate feature flag.

Feature.enabled?('ci_enable_scheduled_build') # Check if the feature is enabled
Feature.enable('ci_enable_scheduled_build') # Enable this feature
Feature.disable('ci_enable_scheduled_build') # Disable this feature

This feature will be evaluated in gitlab-com/gl-infra/infrastructure#5223 (closed). After we made sure it's fully functional, we're going to remove the feature flag in https://gitlab.com/gitlab-org/gitlab-ce/issues/52183.

BE TODO

  • Add a new status scheduled to ci_builds.status
  • Respect allow_failure: true/false, however scheduled status should block pipline
  • Retry shouldn't reschedule
  • Play immidiately endpoint
  • Ping DB team to review
  • Write Unit tests
  • Write Integration tests
  • Feature flag

FE TODO

  • dropdown in pipelines list
  • [-] dropdown in environments list => https://gitlab.com/gitlab-org/gitlab-ce/issues/52129
  • icons in pipeline graph
  • tooltip in pipeline graph with remaining time (will be made dynamic in follow-up issue)
  • buttons on job list
  • empty state for scheduled jobs
  • favicon overlay
  • [-] Docs => https://gitlab.com/gitlab-org/gitlab-ce/issues/52127

BE+FE TODO

  • Write Feature/Acceptance tests

What are the relevant issue numbers?

Close https://gitlab.com/gitlab-org/gitlab-ce/issues/51352

Sample gitlab-ci.yml

# This job starts 
job:
  script: date
  when: delayed
  start_in: 3 minutes

Does this MR meet the acceptance criteria?

  • Changelog entry added, if necessary
  • [-] Documentation created/updated => https://gitlab.com/gitlab-org/gitlab-ce/issues/52127
  • Tests added for this feature/bug
  • Conforms to the code review guidelines
  • Conforms to the merge request performance guidelines
  • Conforms to the style guides
  • Conforms to the database guides
Edited Oct 05, 2018 by Shinya Maeda
Assignee
Assign to
Reviewer
Request review from
11.4
Milestone
11.4 (Past due)
Assign milestone
Time tracking
Reference: gitlab-org/gitlab-foss!21767
Source branch: scheduled-manual-jobs

Revert this merge request

This will create a new commit in order to revert the existing changes.

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.