"Reference not found' error for pipelines with delayed execution after merge

Problem

When retrying a child, merge train pipeline that has failed with strategy: depends, the trigger job fails with 'Reference not found'.

We find that the reference no longer exists because the train's MergeTrains::Car has been removed already(that happens when the train fails).

Screenshot_2023-10-19_at_3.14.34_PM

We may also see this user error for manual jobs in child pipelines. There are a few different calling classes in the gitlab.com logs and only some of them appear to be retries:

https://log.gprd.gitlab.net/app/r/s/0vBPd

Calling classes

Config

https://gitlab.com/allison.browne/child-train-retry

What causes it

We have one train ref per MR and that train ref is cleaned up(deleted) when the pipeline is removed from the train, which happens when the pipeline is marked as failed/success.

Other "Reference not found' Problems

  • Running Manual jobs after source branch is deleted by a merge, force pushed to or deleted
  • Retry a job after source branch is deleted by a merge, force pushed to or deleted

Downstream MR pipelines that are merged with source branch deletion enabled

  1. Configure a downstream pipeline using branch pipeline not MR pipelines
    • Use branches pipelines so that the persistent ref is created from the source branch and not the MR ref which has a different lifecycle
    • Leave out strategy:depends meaning the pipeline is successful as soon as the downstream is created
j1: 
  script: sleep 60
j2: 
  trigger: child.yml
  1. Open an MR
  2. Enable delete source branch on merge
  3. Enable Auto-merge on the MR
  4. Note that the downstream fails because the ref is not found

Proposal

  • Changing from Ci::PersistentRefs to Ci::PersistentCommit and only store the single commit
  • Changing the lifecycle to create the Ci::PersistentCommit
    • Only create Ci::PersistentCommit entries for commits that would actually be orphaned
    • Create them at the moment of the destructive Git operation, not proactively
  • Give Ci::PersistentRefs a documented TTL, which aligns with archival and time decay of job

We might even consider calling these Ci::OrphanedPipelineCommits which would indicate the purpose and lifecycle better.

Edited by 🤖 GitLab Bot 🤖