"Reference not found' error for pipelines with delayed execution after merge
Problem
When retrying a child, merge train pipeline that has failed with strategy: depends, the trigger job fails with 'Reference not found'.
We find that the reference no longer exists because the train's MergeTrains::Car has been removed already(that happens when the train fails).
We may also see this user error for manual jobs in child pipelines. There are a few different calling classes in the gitlab.com logs and only some of them appear to be retries:
https://log.gprd.gitlab.net/app/r/s/0vBPd
Config
https://gitlab.com/allison.browne/child-train-retry
What causes it
We have one train ref per MR and that train ref is cleaned up(deleted) when the pipeline is removed from the train, which happens when the pipeline is marked as failed/success.
Other "Reference not found' Problems
- Running Manual jobs after source branch is deleted by a merge, force pushed to or deleted
- Retry a job after source branch is deleted by a merge, force pushed to or deleted
Downstream MR pipelines that are merged with source branch deletion enabled
- Configure a downstream pipeline using branch pipeline not MR pipelines
- Use branches pipelines so that the persistent ref is created from the source branch and not the MR ref which has a different lifecycle
- Leave out
strategy:dependsmeaning the pipeline is successful as soon as the downstream is created
j1:
script: sleep 60
j2:
trigger: child.yml
- Open an MR
- Enable delete source branch on merge
- Enable Auto-merge on the MR
- Note that the downstream fails because the ref is not found
Proposal
- Changing from
Ci::PersistentRefs toCi::PersistentCommitand only store the single commit - Changing the lifecycle to create the
Ci::PersistentCommit- Only create
Ci::PersistentCommitentries for commits that would actually be orphaned - Create them at the moment of the destructive Git operation, not proactively
- Only create
- Give
Ci::PersistentRefs a documented TTL, which aligns with archival and time decay of job
We might even consider calling these Ci::OrphanedPipelineCommits which would indicate the purpose and lifecycle better.
