After implementing #213666 (closed), in this issue, we need store failure reason information of downstream pipelines in bridges. (More discussions about it: !28120 (comment 312091893))
Then we can have this behavior:
We also need to add failure reason for strategy:depend situations. We may do this in another issue/MR.
I would also consider adding downstream_pipeline_failed failure reason to a the list of failure reasons supported by Ci::Bridge. It seems like a small effort, and might improve the situation here significantly. We probably would need to add this to self.drop!(...) in Ci::Bridge#inherit_status_from_downstream
Old Proposal
Persist all pipeline creation errors in Ci::Pipeline model. Maybe we can reuse Ci::Pipeline#yaml_errors and rename it to Ci::Pipeline#creation_errors?
When creating a downstream pipeline we now persist always the association between bridge job and downstream pipeline, so we can always reference the downstream pipeline from the bridge job.
Change Gitlab::Ci::Status::Build::Failed#failure_reason_message to append related downstream pipeline errors if failure_reason_message is called to display bridge job errors.
We could look into changing the failure_reason_message to append any additional errors from the downstream pipeline.
classCi::Status::Build::Faileddeffailure_reason_messagemessages=[]messages<<self.class.reasons.fetch(subject.failure_reason.to_sym)messages+=subject.additional_errorsmessages.join(', ')endendclassCi::Bridgedefadditional_errorsreturn[]unlessfailure_reason=='downstream_bridge_pipeline_creation_failed'# this is not performant and queries need to be optimizedsourced_pipelines.mapdo|source_pipeline|source_pipeline.pipeline.yaml_errorsendendendclassCi::Builddefadditional_errors[]endend
Technical Background
When triggering a downstream pipeline with a bridge job - if the downstream pipeline creation errors out (e.g. an invalid YAML config), the relationship between the bridge job and the downstream pipeline is not saved.This is because the relation is being added in the `Seed` pipeline chain step and the chain is broken in `Confing::Content` or `Config::Process` which comes before `Seed`.This is required for displaying the downstream creation errors in upstream bridges.
When failure reason is downstream_pipeline_creation_failed we can extend the error message with the pipeline.yaml_errors from the downstream pipeline and display that in the tooltip of the failing bridge job.
Matija Čupićchanged the descriptionCompare with previous version
Moving this to the %Backlog for now due to emergency priorities on the CI team, but child/parent pipelines remains an important priority for us to schedule next.
Actually, I'm gonna update to workflowready for development since it is in %12.10 but if it is not ready for dev then I would need to remove it from the milestone.
Downstream pipeline had been persisted and then failed.
Downstream pipeline could not be created.
@furkanayhan I believe that this is a reasonable solution to store the failure reason in bridge's ci_builds.failure_reason in case 1. We might start simple with one failure reason, and extend them as needed.
In case 2. we could presumably read the downstream pipeline failure reason, right?
I agree with Grzegorz. A first iteration could only communicate that the downstream pipeline creation failed - it doesn't have to be too specific. Later iterations can add specific error information.
We might start simple with one failure reason, and extend them as needed
I think this would not be maintainable because the problem is not about the failure reason but the error message. Failure reason identifies a category of failures while the error message contains the details (e.g. config for job X contains an unknown key: Y).
This issue is highly related to #197140 (closed) (as it solves the 1st point). I think that most of the UX for the error message can be fixed by the other 2 points in the issue:
As a user I want to understand which child pipeline was triggered by which job
As a user I want to navigate to a child pipeline page so that I have a narrow focused view of the pipeline
I was thinking that perhaps ci_sources_pipelines can be a candidate for a new column given that is responsible for connecting a bridge job to a downstream pipeline? However as MVC we could try to just bubble up the error message from downstream pipeline up to the bridge job in the UI if the downstream pipeline is persisted. We always use save_on_errors: true in Ci::CreateCrossProjectPipelineService, so any pipeline triggered using trigger: syntax will have a persisted pipeline. This means we don't need to persist any data more than what we already have.
@fabiopitino if we do not persist a pipeline in a downstream project, validation error might be just one of meany reasons why a pipeline can not be created Perhaps we should understand what errors we can actually expect, and only then decide if the file_reason if enough, or not enough.
@grzesiek My understanding is that we always persist a downstream pipeline. But the errors that are currently persisted in the downstream pipeline record are only yaml_errors.
I'm proposing 2 alternative approaches:
If a bridge job is failed, display in the UI the content of downstream_pipeline.yaml_errors - No need to persist more data, but it does not display any errors aside from syntax errors.
Capture the returned error from the downstream pipeline and persist it somewhere, then display it as is in the UI - Requires us to persist it, can capture any possible errors, even those we may add in the future.
The reason why I think capturing and persisting the pipeline error is a better approach is that we can add more failures in the future inside CreatePipelineService and we don't need to maintain a list of errors in Ci::CreateCrossProjectPipelineService. Even if it makes us introducing a new column.
Perhaps we should understand what errors we can actually expect
The CreateCrossProjectPipelineService does exactly that at the moment. It can drop a pipeline with insufficient_bridge_permissions or downstream_bridge_project_not_found. These are validations that are already present inside CreatePipelineService but we are duplicating them outside it in order to set a specific failure_reason. If we got rid of these validations and instead capture and save errors coming from the pipeline creation we could simply delegate all the logic to CreatePipelineService. This would remove some complexity in CreateCrossProjectPipelineService.
There could be a 3rd approach also, which is turning yaml_errors into storing any pipeline creation errors
If a bridge job is failed, display in the UI the content of downstream_pipeline.yaml_errors - No need to persist more data, but it does not display any errors aside from syntax errors.
Capture the returned error from the downstream pipeline and persist it somewhere, then display it as is in the UI - Requires us to persist it, can capture any possible errors, even those we may add in the future.
I think those are not exactly alternative to each other :) With 1, we will only show some error cases. With 2, we will show all error cases.
I agree that capturing and persisting the pipeline error is a better approach, however it requires us to introduce a new column, probably in ci_builds table.
There could be a 3rd approach also, which is turning yaml_errors into storing any pipeline creation errors
Do you mean persisting all downstream pipelines for all errors?
There could be a 3rd approach also, which is turning yaml_errors into storing any pipeline creation errors
Do you mean persisting all downstream pipelines for all errors?
I mean we can change the way we set yaml_errors to actually include all pipeline creation errors (e.g. Pipeline not created successfully because there are no jobs/stages). Then, if the downstream pipeline is persisted (I believe we always do that for downstream pipelines) we have the errors persisted.
To be clear, this won't occur if the pipeline is not persisted (e.g. parent pipeline created via API) but in that case the error message is returned to the user via some form of response. This is out of the scope of this issue.
We need to check if changing yaml_errors to include all errors may have unintended side effects. IMO having pipeline.creation_errors (renamed from yaml_errors) is better as we can also use failure_reason to know the category of the error message. Right now yaml_errors is only set when failure_reason: :config_error. Could this be a viable alternative?
I think that the Ci::CreatePipelineService does not always persist the pipeline even though save_on_errors is set to true. This inconsistency might prevent the upstream pipeline from accessing yaml_errors of the downstream pipeline.
@furkanayhan Do I understand correctly that in order to surface all errors without actually persisting a downstream pipeline we would need to add two new failure reasons? I, kind of, like the idea of adding new failure reasons because it makes the workflow more explicit, but on the other hand, it might not scale if we will have much more reasons of not creating a downstream pipeline.
I think that adding two or three failure reasons is acceptable, it makes the workflow more explicit and covers known edge cases of when we don't want to create a downstream pipeline. On the other hand we should not use failure reasons for things that are more volatile than the edge cases of a workflow, but I see things like this in this case, am I correct?
I'm fine with adding two / three new failure reasons if this makes implementation more simple
@furkanayhan Yes, you are right. I didn't think of the preconditions.
I think the method ensure_preconditions! is a design smell. It duplicates validations already existing in CI::CreatePipelineService specifically those prior to persisting a pipeline. If we change CreatePipelineService by adding more steps to the chain that may cause a pipeline not to be persisted, we need to remember to also update CreateCrossProjectPipelineService. It also means we have tight coupling with the error messages returned by CreatePipelineService if we want to translate an error message (string) into a failure_reason symbol. Changing/improving the error message has a direct impact to the CreateCrossProjectPipelineService.
We have these preconditions on the first place maybe because we don't have a consistent way to persist pipeline errors? IMO most of those preconditions could be removed if we had a way to persist what errors occurred while trying to create a downstream pipeline.
Adding a column to ci_builds is probably questionable given the size of the table. Perhaps we could use build.options which is very easy to do. However it may be problematic if we want to migrate this data to a column in the future.
Alternatively, could we introduce a table with an association only defined in Ci::Bridge? E.g. ci_downstream_pipeline_errors(id, source_job_id, message, created_at) and
Storing these error messages means we could remove the preconditions in CreateCrossProjectPipelineService (which take almost 50% of the file) and simplify the logic where the service simply delegates to CreatePipelineService and doesn't do anything more than mapping the bridge job with the downstream pipeline.
I'm not completely against adding new failure reasons instead, as per @grzesiek comment above. My concern is that this issue is not about failure reasons but about failure messages. When a pipeline can't be created the error message may contain details that are useful to the end-user. By showing the failure reason we are only telling the user which category the error falls in but not the details. E.g. if a child pipeline is not created because of syntax errors. How can I effectively debug and fix the pipeline from UX perspective?
As long as the solution we end up choosing can answer this scenario I'm ok with it regardless of what approach we use.
in order to surface all errors without actually persisting a downstream pipeline we would need to add two new failure reasons
two new reasons may not cover all errors, but most of them.
On the other hand we should not use failure reasons for things that are more volatile than the edge cases of a workflow, but I see things like this in this case, am I correct?
I think the method ensure_preconditions! is a design smell. It duplicates validations already existing in CI::CreatePipelineService specifically those prior to persisting a pipeline...
we could remove the preconditions in CreateCrossProjectPipelineService (which take almost 50% of the file) and simplify the logic where the service simply delegates to CreatePipelineService
I don't think we can remove all preconditions in CreateCrossProjectPipelineService.
For example: can_create_downstream_pipeline?.
We don't check can?(current_user, :update_pipeline, project) in Ci::CreatePipelineService, right? We only cover can?(current_user, :create_pipeline, downstream_project) and can_update_branch?(target_ref) in different way.
Alternatively, could we introduce a table with an association only defined in Ci::Bridge? E.g. ci_downstream_pipeline_errors(id, source_job_id, message, created_at)
Instead of creating a new table, I would prefer to use ci_builds_metadata or ci_builds.options .
My concern is that this issue is not about failure reasons but about failure messages
The only failure reason that needs to be explained is YAML errors, right? Other failure reasons are self-explanatory.
@furkanayhan can you enumerate the errors we would like to get covered either by failure reasons or by something else? It might be easier to find a good solution once we have this information.
I don't think we can remove all preconditions in CreateCrossProjectPipelineService.
For example: can_create_downstream_pipeline?. We don't check can?(current_user, :update_pipeline, project) in Ci::CreatePipelineService, right? We only cover can?(current_user, :create_pipeline, downstream_project) and can_update_branch?(target_ref) in different way.
I think those could be moved down into CreatePipelineService. I've created #218245 (closed) to discuss that.
If finding a small set of failure reasons would not be possible, I think that options might be acceptable too, but this way we can actually leak things that should never bubble up to an end user (like exception details, what leaded to a security problems in the past when an exception contained more information that a user should see).
Thao Yeagerchanged title from Display downstream pipeline errors in upstream pipelines to Display downstream pipeline errors in upstream pipelines (UX improvement)
changed title from Display downstream pipeline errors in upstream pipelines to Display downstream pipeline errors in upstream pipelines (UX improvement)