Skip to content

Send notifications if pipeline schedule failed to create a pipeline

Problem

The feature "Pipeline Schedule" cannot continue creating a pipeline with some case, for example,

  • Pipeline owner is no longer a part of project members
  • Target branch was set to be protected and the pipeline owner doesn't have an permission to create a pipeline on the protected branch.
  • etc

We should notify users that "Why/When" pipeline schedule failed to create a pipeline and guide them a recovery action.

Proposal

  • Send a mail to a user (maybe project owner or schedule owner) when pipeline schedule failed to create a pipeline
  • Indicate why/when failed, and show a guide how to recover the situation

Investigation (done): How do existing pipeline status emails work?

  1. PipelineScheduleWorker runs using Cron (see pipeline_schedule_worker in gitlab.yml for frequency)
  2. RunPipelineScheduleWorker is performed for each runnable scheduled job
  3. It calls Ci::CreatePipelineService
  4. It runs a sequence of checks as part of execution
  5. Gitlab::Ci::Pipeline::Chain::Validate::Abilities is called and fails if the owner no longer has permission to create a pipeline
  6. Other checks include whether the target branch still exists, whether a Security Policy allows it, etc

... the pipeline runs...

  1. [Something calls Integrations::PipelinesEmail, not sure what]
  2. Integrations::PipelinesEmail checks status in should_pipeline_be_notified?
  3. PipelineNotificationWorker does a couple more checks, then calls
  4. NotificationService.new.pipeline_finished(pipeline, ref_status: ref_status, recipients: recipients)
  • Note: The pipeline_finished notification needs a persisted pipeline. We don't have one of those!

Implementation Plan

To implement emails when a pipeline is not created:

  1. Implement metrics to track how often this happens, so we understand how many emails we might start sending
    • Update RunPipelineScheduleWorker to gather metrics when error responses are returned from Ci::CreatePipelineService
    • Presumably we keep this distinct from the existing error handling there, which seems more focussed on handling StandardError.
      • Note: I also haven't found any instances of the existing error handling; is it because we've since moved to handling all the errors gracefully and the existing error handling is dead code? 🤔
  2. Update Integrations::PipelinesEmail with a new attribute notify_scheduled_pipeline_creation_failure. (default: false)
  3. Emails
    • Create a new notifier and worker ScheduledPipelineCreationFailureNotificationWorker (or something less verbose)
      • Takes a scheduled pipeline and an error message.
    • Add NotificationService#scheduled_pipeline_creation_failure and view
      • MVP: Link to the project's Scheduled Pipelines page. Also link to a generic troubleshooting docs page, instead of trying to add fixes in the email itself.
      • Check: do we have easy metrics on this already? Or do we need to add some.
  4. Update Integrations::PipelinesEmail
    • add a new supported_events called scheduled_pipeline_creation_failure
    • call ScheduledPipelineCreationFailureNotificationWorker when the event is scheduled_pipeline_creation_failure
  5. Update RunPipelineScheduleWorker to call Integrations::PipelinesEmail#execute WHEN
    • Project-based FeatureFlag is enabled
    • response['status'] is 'error'
    • the reason is TBC:
      • MVP: Abilities failure.
      • Later: Repository & Policy are good candidates. As is when CreatePipelineService raises a StandardError.
      • Never: Rate limiting
    • Alternative Add this logic to CreatePipelineService
  6. UI
    • Rename the header to Notification options
    • Rename the UI checkbox from Notify only broken pipelines to Ignore successful pipelines (value, column_name & behavior remains the same).
    • Add a UI checkbox Notify when scheduled pipelines fail to start, default it to true
      • Hidden behind the same project-based FF
  7. Roll out the FF

Not in this proposal:

  • Easier control over notification preferences. E.g. "I want to be notified ONLY if it fails to start" isn't possible in the above proposal.
  • The failure modes listed as "Later". We can open issues to add those, and include documentation to reflect what's implemented
  • Including remediation actions in the emails
Edited by Nick Malcolm