Scheduled pipelines, when created, are owned by that individual. However if that member leaves a project(we see this frequently as members switch teams etc.) then the scheduled pipeline fails to execute and the project team are non the wiser.
Errors generated when a scheduled pipeline with configuration errors are rescued without any thing logged which causes pipelines not to be triggered at all with no alert. In the case of scheduled specific jobs, this may not necessarily be caught at any other point
defrun_pipeline_schedule(schedule,user)Ci::CreatePipelineService.new(schedule.project,user,ref: schedule.ref).execute!(:schedule,ignore_skip_ci: true,save_on_errors: false,schedule: schedule)rescueCi::CreatePipelineService::CreateError# no-op. This is a user operation error such as corrupted .gitlab-ci.yml.rescue=>eerror(schedule,e)end
What is the current bug behavior?
No notifications are received for the failure and there is no entry in the pipelines tab for these failed scheduled pipelines
What is the expected correct behavior?
If a scheduled pipeline fails to execute either each project member or maintainers/owners at least, should receive some notification(email?) that it failed to execute.
Relevant logs and/or screenshots
N/A
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
N/A
Results of GitLab application Check
N/A
Possible fixes
Send an email to maintainer(s) for the project - Send an email, the status of the pipeline will be "Failed to Create pipeline". This will match the status shown on the Scheduled Pipelines interface when that issue is implemented.
Add a todo for the schedule owner with the failure
Add to the pipeline page with a failed pipeline We deemed this too noisy for now, the pipeline page is already hard to understand.
Implementation Guide
Proposal
Updated proposal:
Send a notification to project owner when a schedule owner has been:
deleted
leaves the project
is disabled
The following will be the subject of the email: Assign a new owner to the pipeline schedule: [Pipeline schedule name]
Copy for the email
The owner of the pipeline schedule [Pipeline schedule name] is no longer available. Without an owner, this schedule will fail to run. Assign a new owner to ensure the pipeline runs as expected.
This would be applicable to pipelines that failed due to invalid permissions. For pipelines that failed to to invalid ci, there's already an email template sent out. We can enhance the current template with the mockups attached.This change should be Feature flagged as it'll cause increase in noise/emails. Should pull some metrics to see scope of impact
@markglenfletcher@thaoyeager Any update on this? Since raising this and implementing a custom script that uses the API to catch the issue we have seen 8 occurrences.
When this is implemented, if a user manually runs a schedule, could they end up receiving any failed notification?
I expect this to be a stretch, but allowing others to receive the error for these pipelines would be nice as well (e.g., project admins or DevOps folks for the project). Currently we have a user which gets emails sent to a list, but getting the Todo to appear in GitLab itself for a set of users would be great.
Enterprise customer is impacted by this bug and looking for a resolution or workaround.
Customer was troubleshooting a pipeline and couldn’t understand why it was not running, upon investigation found that the job was blocked due to a disabled user but GitLab does not report this anywhere.
This information being included in the UI would have saved the customer significant time in fixing the issue.
We also need this feature. Now, we are using a workaround - we have created a bot with a "project-mail-address". This bot owns the schedule and now we're getting the updates.
It would be nice, when the notifications-mails of a schedule are send to all developers, maintainer and/or owner.
Maybe would a "project-notification"-system nice, where each member can decide, which notifications he/she wants to receive.
@richard.chong@marknuzzo@samdbeckham I think we should consider pulling this into an upcoming milestone soon because it has impacted internal teams and has recent customer interaction. I'm applying VerifyP2 to reflect that when we prioritize bugs for upcoming milestones.
Hi @jheimbuck_gl - thanks for the ping here. Though we have a large amount of issues that we need to evaluate in %15.3 which could create a swell of carry over potentially into %15.4, looking at %15.4 planning issue, at the moment, it seems like the soonest we can consider pulling this into an upcoming milestone. As we get closer, we may have a better idea if this is reasonable but just wanted to give my initial thoughts here for consideration.
thanks for the offer @leetickett! I'm working on refining this collection of scheduled pipeline issues into two main ones. I'll drop a note back here when I'm done with that and we can see about getting these ready to go.
Thanks for the context @marknuzzo. I agree milestones are getting full already, we'll need to evaluate if this has higher impact than other work already scheduled.
@samdbeckham@marknuzzo@richard.chong - could be recency bias but i've seen more instances of pipelines failing to create lately in issues (mostly scheduled) that would be at least quicker to troubleshoot for users if this was resolved. Let's see what we can swap out of an upcoming milestone (%15.4 maybe?) to address this gap.
Thanks for the call out here @jheimbuck_gl - I completely agree that if we can nudge this up, having that messaging would be impactful for the users. I'm reviewing the upcoming milestones to see how about candidate swaps here.
Hi @jheimbuck_gl - at the moment, we currently have space in %15.4 while staying under our WIP limit to move this issue into %15.4.
If we want to stay at or under our WIP limit in %15.3, the bottom 5-ish in this list needs to move out with possibility for accommodation in %15.4 if we increase the WIP limit to 8 for typebug like we did for %15.3. WDYT? We can update this to %15.4 for now to help unblock users here. This issue should also have usability added as well.
I'm assisting by checking the Kibana on behalf of the customer to pass on the reason for the failure.
Currently this issue is severity2Broken feature with an unacceptably complex workaround.
For folks (edit:) on Gitlab.com with a subscription, the workaround is "ask GitLab support", but for folks with no subscription it's Broken feature with no workaround - severity1
This grouppipeline execution bug has at most 25% of the SLO duration remaining and is ~"approaching-SLO" breach. Please consider taking action before this becomes a ~"missed-SLO" in 14 days (2022-08-21).
@jheimbuck_gl I planned to add a final solution for this by the end of next week(14th October). If you think that's a very close call, then we should push it ahead.
@v_mishra I think the downside to that approach would be extra pipelines on the pipeline page if there are a bunch of scheduled pipelines failing but it could be an MVC while we investigate other approaches on the pipeline schedule page itself.
I think the downside to that approach would be extra pipelines on the pipeline page if there are a bunch of scheduled pipelines failing but it could be an MVC while we investigate other approaches on the pipeline schedule page itself.
@jheimbuck_gl I was also looking around for how else we could address it. I'm having a hard time understanding why scheduled pipeline is excluded from pipeline status emails integrations. May be we can just start with including them?
I got a chance to speak to other team members during a co-working session, and it looks like users do get an email for scheduled pipelines using the integration. But there's no indication on the email if the pipeline is a scheduled one.
No notifications are received for the failure and there is no entry in the pipelines tab for these failed scheduled pipelines
We should communicate to users that they need to enable the pipeline status emails integrations in order to get updates about their scheduled pipelines
In the email we should use our existing labels to identify pipeline types(merged results, merge request, scheduled, merge train)
From my point of view it's not a good idea, since I need go trough all 2000 projects in my groups to enable the integration feature.
Then go trough this list every time a person who should get the email leaves the team or joins the team.
And finally remember to enable this feature for every newly created project although I actually don't get any notification that someone has created a project in one of my groups.
You actually show screenshot about pipeline that was successfully created and failed due to job failure, not about pipeline which failed to be created due to permissions and/or ci script error. So new email cause would still need to be additionally created as a quick test shows that currently this integration does not send emails in scheduled pipeline creation failure scenario.
My choice would be a checkbox in Project -> Settings -> Ci / CD -> General Pipelines, which would toggle (default: enabled) email to all owners and maintainers and schedule owner that scheduled pipeline failed to be created.
I need go trough all 2000 projects in my groups to enable the integration feature
This is available at the group level which could (depending on your setup) help with that. At GitLab we make use of group emails as well to simplify that admin problem of adding/removing users.
So new email cause would still need to be additionally created as a quick test shows that currently this integration does not send emails in scheduled pipeline creation failure scenario.
hm i'm seeing the same issue, a test will send but the scheduled pipeline is not today. This would likely get fixed with this issue since though.
Again thanks Peter we'll keep your feedback in mind as we think about the direction for notifications for scheduled pipelines going forward beyond this MVC!
Thanks your your reply @jheimbuck_gl
That group setting will certainly help.
It would be also great if we w0uld get notify only un-created pipelines as number of failed pipelines can be rather spammy in large group. Those will be just notified to their owners by usual way to fix them.
From my (wishful thinking) point of view, this is something that should land in the CI/CD YAML configuration, instead of a project or group setting. Having this in the UI makes the feature unnecessary complex and disconnected from the rest of the pipelines configuration.
By doing so, it opens a lot of different doors for the scenarii mentioned above. Users have a choice to include a global template, or even enforce some scheduled pipelines with compliance pipelines.
Scheduled pipelines would also benefit from the git workflow we already use for the CI/CD yaml configuration file, with the support for CODEOWNERS and approvals. Currently, any Maintainer can change the settings of a scheduled pipeline without following the 4-eyes principle. And I'm not even talking about the security implications here (this issue isn't confidential).
I understand we probably went to the UI with this because scheduled pipelines need to be run by a user, but I think this is an obsolete design today. It ties these pipelines to user accounts instead of Project bots.
I understand this is going way beyond the scope of this issue, so I'll create one to discuss this if some of you are interested.
I think that yaml config would not catch the most prominent scenario when pipeline owner gets blocked or removed from project.
Sidekiq pipeline creation job for user without permission does not even parse the yaml (at or least it should not for performance reasons)...
Also invalid yaml could not be handled by yaml variables.
But of course it would much offer much better options for many other scenarios...
I think that yaml config would not catch the most prominent scenario when pipeline owner gets blocked or removed from project. Sidekiq pipeline creation job for user without permission does not even parse the yaml (at or least it should not for performance reasons)...