Upstream pipeline execution can be controlled by users with CI permissions in a downstream project
A Problem
In the course of investigating #348465 (closed), we discovered that retrying a job in a downstream pipeline allows the retrying user to take ownership of all the skipped jobs in any upstream pipeline, regardless of their permissions on the upstream project.
In this example, Build 6 has failed in the downstream pipeline.
graph TD
A[Upstream Pipeline, Build 1, SUCCESS] --> B[Upstream Pipeline, Build 2, SUCCESS]
B[Upstream Pipeline, Build 2, SUCCESS] --> C[Trigger job to Downstream Pipeline, Bridge 3]
C[Trigger job to Downstream Pipeline, Bridge 3, FAILED] --> D[Upstream Pipeline, Build 4, SKIPPED]
D[Upstream Pipeline, Build 4, SKIPPED] --> E[Upstream Pipeline, Build 5, SKIPPED]
C[Trigger job to Downstream Pipeline, Bridge 3, FAILED] --> F[Downstream Pipeline, Build 6, FAILED]
F[Downstream Pipeline, Build 6, FAILED] --> G[Downstream Pipeline, Build 7, SKIPPED]
By retrying Build 6, I will take ownership of Build 6, Build 7, Bridge 3, Build 4, and Build 5. Also, all of those jobs we be queued for processing. Whether or not I have permission to access, nevermind execute CI, in the upstream project is disregarded. This logic is built into the AfterRequeueJobService, where we call #process_subsequent_jobs
and #reset_source_bridge
A walk through the code paths
#process_subsequent_jobs
transfers job ownership of any job in the pipeline that was skipped, as long as it's in a subsequent stage or directly needs
the job being retried. The details of the logic are fairly clear in the AfterRequeueJobService. This much logic makes sense to me, as the retryer is now directly responsible for those jobs being executed, and it makes sense that we record them as such. At this point, Build 6 and Build 7 have been reassigned to the retryer and set to run again.
#reset_source_bridge
is where we start to bypass permissions. This helper method calls Ci::Pipeline#reset_source_bridge
and passes it the same user. In that method, we reset the status of the source bridge to pending
, so it can wait for the eventual success or failure of the downstream pipeline, and then we pass the source bridge into another instance of AfterRequeueJobService, with the same user. This applies the same reassign-all-subsequent-jobs logic with the retryer from the downstream pipeline, but now we're applying it to remaining jobs in the upstream pipeline. We have no idea if this user is even a member of the project. There's no check on the identity of the user at all.
Why was this built in this way?
I don't think this behavior is something we ever decided intentionally. There are two other things that we decided, on principle:
- When a depended-on downstream pipeline changes from failure to success, the now non-blocked upstream pipeline should continue processing.
- When someone takes a manual action to cause a pipeline to execute, they should be directly assigned as the owner of that execution.
By making both of those things true, we made it possible for people to take over jobs that they otherwise would not have the permissions to create.
This is, on paper, a privilege escalation by granting CI execution rights in a project to anyone with CI execution rights in a downstream project.
Does this actually happen to anyone?
In practice on our ops instance, we saw subsequent jobs, especially trigger jobs to other downstream pipelines, fail because the jobs had all been reassigned to a user that did not have permission to take the actions of a job that normally belongs to the release bot. SETs that retried QA smoke test pipelines were taking ownership of SaaS deployments, which would then fail. This is not good for us, but could potentially be worse if we were more picky about who deploys when and had a problem with tracing accountability.
A Proposal
My proposal is simply to not reset the source bridge (trigger job). Conveniently, grouppipeline authoring is already working on the functionality to directly retry a trigger job &6947 (closed). This is already useful in a scenario where the downstream pipeline is fails to be created at all. Permissions problems, such as the kind we saw in #348465 (closed), are an example of when we want this kind of retry functionality. It was never built, because retrying the downstream pipeline, by resetting the source bridge, effectively retried the source bridge job automatically.
We should not do this. At least not without checking the CI permissions in the upstream project first. Something to the effect of:
module Ci
class Pipeline
def reset_source_bridge!(current_user)
- return unless bridge_waiting?
+ return unless bridge_waiting? && user.can?(:execute_pipeline, source_bridge.project)
source_bridge.pending!
Ci::AfterRequeueJobService.new(project, current_user).execute(source_bridge) # rubocop:disable CodeReuse/ServiceClass
end
So because the upstream pipeline will no longer restart it's execution, we'll probably also want to send some kind of notification to the upstream project if and when the downstream pipeline succeeds. At this point, the upstream pipeline will be "unblocked", and someone with correct permissions can click retry &6947 (closed) to restart upstream execution.
Summary of Changes
- Check user CI permissions in upstream project before resetting the status of the upstream bridge job
- Make sure that the new "Retry" functionality &6947 (closed) marks a bridge job as successful when the downstream pipeline has been retried and fixed.
- In case where the source bridge is not restarted, provide some kind of notification in the upstream project of the downstream pipeline success. Someone needs to be reminded to retry the source bridge to start the upstream pipeline execution.