Send notifications if pipeline schedule failed to create a pipeline
Problem
The feature "Pipeline Schedule" cannot continue creating a pipeline with some case, for example,
- Pipeline owner is no longer a part of project members
- Target branch was set to be protected and the pipeline owner doesn't have an permission to create a pipeline on the protected branch.
- etc
We should notify users that "Why/When" pipeline schedule failed to create a pipeline and guide them a recovery action.
Proposal
- Send a mail to a user (maybe project owner or schedule owner) when pipeline schedule failed to create a pipeline
- Indicate why/when failed, and show a guide how to recover the situation
Investigation (done): How do existing pipeline status emails work?
-
PipelineScheduleWorker
runs using Cron (seepipeline_schedule_worker
ingitlab.yml
for frequency) -
RunPipelineScheduleWorker
is performed for each runnable scheduled job - It calls
Ci::CreatePipelineService
- It runs a sequence of checks as part of execution
-
Gitlab::Ci::Pipeline::Chain::Validate::Abilities
is called and fails if the owner no longer has permission to create a pipeline - Other checks include whether the target branch still exists, whether a Security Policy allows it, etc
... the pipeline runs...
- [Something calls
Integrations::PipelinesEmail
, not sure what] -
Integrations::PipelinesEmail
checks status inshould_pipeline_be_notified?
-
PipelineNotificationWorker
does a couple more checks, then calls NotificationService.new.pipeline_finished(pipeline, ref_status: ref_status, recipients: recipients)
- Note: The
pipeline_finished
notification needs a persisted pipeline. We don't have one of those!
Implementation Plan
To implement emails when a pipeline is not created:
- Implement metrics to track how often this happens, so we understand how many emails we might start sending
- Update
RunPipelineScheduleWorker
to gather metrics whenerror
responses are returned fromCi::CreatePipelineService
- Presumably we keep this distinct from the existing error handling there, which seems more focussed on handling
StandardError
.- Note: I also haven't found any instances of the existing error handling; is it because we've since moved to handling all the errors gracefully and the existing error handling is dead code?
🤔
- Note: I also haven't found any instances of the existing error handling; is it because we've since moved to handling all the errors gracefully and the existing error handling is dead code?
- Update
- Update
Integrations::PipelinesEmail
with a new attributenotify_scheduled_pipeline_creation_failure
. (default: false) - Emails
- Create a new notifier and worker
ScheduledPipelineCreationFailureNotificationWorker
(or something less verbose)- Takes a scheduled pipeline and an error message.
- Add
NotificationService#scheduled_pipeline_creation_failure
and view- MVP: Link to the project's Scheduled Pipelines page. Also link to a generic troubleshooting docs page, instead of trying to add fixes in the email itself.
- Check: do we have easy metrics on this already? Or do we need to add some.
- Create a new notifier and worker
- Update
Integrations::PipelinesEmail
- add a new
supported_events
calledscheduled_pipeline_creation_failure
- call
ScheduledPipelineCreationFailureNotificationWorker
when the event isscheduled_pipeline_creation_failure
- add a new
- Update
RunPipelineScheduleWorker
to callIntegrations::PipelinesEmail#execute
WHEN- Project-based FeatureFlag is enabled
- response['status'] is 'error'
- the reason is
TBC
:- MVP: Abilities failure.
- Later: Repository & Policy are good candidates. As is when
CreatePipelineService
raises aStandardError
. - Never: Rate limiting
-
Alternative Add this logic to
CreatePipelineService
- UI
- Rename the header to
Notification options
- Rename the UI checkbox from
Notify only broken pipelines
toIgnore successful pipelines
(value, column_name & behavior remains the same). - Add a UI checkbox
Notify when scheduled pipelines fail to start
, default it to true- Hidden behind the same project-based FF
- Rename the header to
- Roll out the FF
Not in this proposal:
- Easier control over notification preferences. E.g. "I want to be notified ONLY if it fails to start" isn't possible in the above proposal.
- The failure modes listed as "Later". We can open issues to add those, and include documentation to reflect what's implemented
- Including remediation actions in the emails
Edited by Nick Malcolm