Treat `stuck_or_timeout_failure` and `job_execution_timeout` as retry:when aliases for the new specific failure reasons
## Summary
In %19.0, !230787 (closes #595752) split the generic `stuck_or_timeout_failure` and `job_execution_timeout` failure reasons into a set of more specific ones emitted by the various `Ci::StuckBuilds::*` and `Ci::TimedOutBuilds::*` services:
| Previous reason | New reasons |
|---|---|
| `stuck_or_timeout_failure` | `stuck_pending_with_matching_runners`, `stuck_pending_no_matching_runners`, `no_updates_running`, `no_updates_canceling` |
| `job_execution_timeout` | `server_timeout_running`, `server_timeout_canceling` |
The original enum values are preserved for historical data, but no new builds are written with them. See [#595752](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752), !230787, and the docs follow-up !237556 for full context.
## Problem
The old reasons are valid values for [`retry:when`](https://docs.gitlab.com/ci/yaml/#retrywhen) in `.gitlab-ci.yml`. `Gitlab::Ci::Config::Entry::Retry::FullRetry.possible_retry_when_values` derives its allow-list from `Ci::Build.failure_reasons.keys`, so a config like:
```yaml
job:
script: ./run.sh
retry:
max: 2
when:
- stuck_or_timeout_failure
- job_execution_timeout
```
still **validates** after the upgrade to %19.0, but it silently **stops matching** any real failures — because the dropper services now write one of the six new reasons instead. Customers who relied on this retry behavior get a silent regression with no warning, no error, and no failed pipeline to alert them. This was flagged as a likely breaking change for anyone consuming failure reasons via the [Jobs API](https://docs.gitlab.com/api/jobs/) or `retry:when`, see [this discussion on #595752](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752#note_3250739522) and drew's [follow-up note](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752#note_3377814112).
We can't fully remove these values — they're in customer `.gitlab-ci.yml` files we don't control — but we also don't want them to mean "nothing" going forward.
## Proposal
Make `stuck_or_timeout_failure` and `job_execution_timeout` behave as **meta-reasons / aliases** in `retry:when` matching. When a user lists either of them under `retry:when`, the retry logic should match against the full set of new, specific reasons that replaced it:
- `stuck_or_timeout_failure` → matches a build that failed with **any** of:
- `stuck_pending_with_matching_runners`
- `stuck_pending_no_matching_runners`
- `no_updates_running`
- `no_updates_canceling`
- `job_execution_timeout` → matches a build that failed with **any** of:
- `server_timeout_running`
- `server_timeout_canceling`
This preserves the original semantic intent of these names ("retry me if I got stuck or timed out") for every existing config, without locking us into emitting the old reasons on new builds.
The implementation should live wherever `retry:when` matching is evaluated against a build's `failure_reason` (the auto-retry logic, not the YAML validator — the validator is already fine since the keys remain in the enum). A central alias map seems cleanest so it can be reused if we do this kind of split again.
### Deprecation warning
Alongside the alias behavior, we should warn users that these names are deprecated and they should migrate to the specific reasons:
- Emit a CI lint / config warning (non-blocking) when `stuck_or_timeout_failure` or `job_execution_timeout` appear under `retry:when`, pointing at the new reasons and the docs.
- The warning should be surfaced in the same places existing CI config warnings show up (pipeline editor, `/ci/lint`, the lint API response).
- Add a deprecation entry under `data/deprecations/` so this lands in the release post and gives self-managed customers lead time.
## Out of scope
- Removing `stuck_or_timeout_failure` / `job_execution_timeout` from the enum or from `possible_retry_when_values`. Both must stay valid for backward compatibility.
- Re-emitting the old reasons on new builds. The split in !230787 is intentional and we want the granular data; this issue is only about how `retry:when` interprets the old names.
## Acceptance criteria
- [ ] `retry:when: [stuck_or_timeout_failure]` triggers a retry when a build fails with any of `stuck_pending_with_matching_runners`, `stuck_pending_no_matching_runners`, `no_updates_running`, or `no_updates_canceling`.
- [ ] `retry:when: [job_execution_timeout]` triggers a retry when a build fails with any of `server_timeout_running` or `server_timeout_canceling`.
- [ ] The existing per-reason behavior is preserved — listing a specific new reason still only matches that reason.
- [ ] A non-blocking deprecation warning is shown in CI lint output / pipeline editor when these legacy reasons are used in `retry:when`.
- [ ] A deprecation notice is added under `data/deprecations/`.
- [ ] Docs in `doc/ci/yaml/_index.md` and `doc/ci/jobs/job_troubleshooting.md` (see !237556, !237605) are updated to describe the alias behavior and point users to the new reasons.
- [ ] Test coverage for the alias matching and the deprecation warning.
## References
- Original issue: [#595752](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752)
- Implementation MR: !230787
- Docs MRs: !237556, !237605
- Drew's [to-do comment](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752#note_3377814112) that initially raised bringing this back as a meta-reason for `retry:when`
- Customer-impact discussion: [note_3250739522 on #595752](https://gitlab.com/gitlab-org/gitlab/-/work_items/595752#note_3250739522)
- Relevant code: `lib/gitlab/ci/config/entry/retry.rb`, `app/models/concerns/enums/ci/commit_status.rb`, `app/services/ci/stuck_builds/`, `app/services/ci/timed_out_builds/`
issue