Skip to content

Fix pipeline status transition when retrying only failed manual jobs

What does this MR do and why?

There is a bug when we retry a pipeline where only manual jobs failed/cancelled, we get an error:An error occured while making the request. Status cannot transition via "succeed".

This MR is to resolve it.

Bug Description

Bug Screenshot

image

Bug video

20220927_171132_edit

Test project: https://gitlab.com/qt-gith/issue-1522-test/

Steps to reproduce

  1. Create a project, such as test project, its .gitlab-ci.yml file is as follows:
stages:
  - prepare
  - test
  - end
  
prepare:
  stage: prepare
  script:
    - echo

test:
  stage: test
  script:
    - echo ${V1}
    - if [[ ${V1} == "123" ]]; then echo "Success"; else echo "Error"; exit 11; fi
  when: manual
  allow_failure: true
  needs:
    - prepare
  1. Run the pipeline.(Make sure there is a Runner to execute the job before this.)
  2. You will see prepare is ok, and test job awaiting configuration. Configure the test job with V1 = 456 and run it, the job will fail.
  3. Refresh the pipeline and retry, An error occured while making the request. Status cannot transition via "succeed" error message appears.

Bug Reason

Debug update_pipeline in AtomicProcessingService, we can find:

  1. Failed manual jobs will be set to :ignored in @status_set of Gitlab::Ci::Status::Composite when retrying a pipeline. According to composite.rb#ignored_status?.
  2. Pipeline status will be set to :success when @status_set of Gitlab::Ci::Status::Composite contains only :success, :skipped, :success_with_warnings, :ignored. According to composite.rb#status.
  3. When we retry a pipeline where only manual jobs fail, @status_set of Gitlab::Ci::Status::Composite contains only :success and :ignored, pipeline status will be set to :success, and then will raise state machine error. According to pipeline.rb.

Solution

Check the new state of the pipeline after update_stages! in AtomicProcessingService#process!. Return if the status is :success.


Another example and explanation of the bug

From !98967 (comment 1144718316);

test1:
  script: exit 0

test2:
  script: exit 1
  when: manual

Initial

The pipeline is success

Screenshot_2022-10-21_at_15.26.40

Ci::Pipeline.last.status # => success

Play test2

The pipeline is passed

Screenshot_2022-10-21_at_15.27.52

Ci::Pipeline.last.status # => success

Retry the pipeline

  1. Status: The pipeline status is success; test1 is success, test2 is failed but "passed".
  2. Click the "retry" button.
  3. Ci::RetryPipelineService creates a new test2 job with the created state.
  4. The retry service calls Ci::ProcessPipelineService->AtomicProcessingService.
  5. In AtomicProcessingService; update_stages! -> update_processables! -> update_processable! updates the status of test2 from created to success.
  6. In update_pipeline!, @collection.status_of_all is success. And we try to update the status of pipeline to success, which is already success.

Solution

  • I do not think we should have a customized logic like return if pipeline.success? && @collection.status_of_all == 'success'.
  • We may allow the transition from success to success and it solves the problem. But I am not sure 🤔

cc @prajnamas @JeremyWuuuuu @mtan-gitlab

Edited by Furkan Ayhan

Merge request reports