Allow manual jobs to run and access artifacts when their dependencies fail

Summary

Currently, GitLab CI/CD does not support a common use case: having a job fail hard to block the pipeline, while still allowing downstream manual jobs to access its artifacts and be triggerable for remediation.


Problem Description

When a job fails without allow_failure: true, any downstream jobs that depend on it (via needs) are automatically skipped, even if they are manual jobs. This prevents a natural workflow where:

  1. A validation job fails and blocks the pipeline (preventing merges)
  2. A manual remediation job remains available to fix the issue
  3. The remediation job needs access to the validation job's artifacts

Real-World Use Case: Translation Validation & AI Remediation

We have a translations pipeline for a mobile app with 31+ locales:

Translation Linter Job:

  • Validates all translations for missing/outdated keys
  • Must fail hard to prevent merging PRs with incomplete translations
  • Produces artifacts containing:
    • List of missing translation keys per locale
    • List of outdated translation keys per locale
    • Validation results in JSON format

AI Translation Job (Manual):

  • Uses AI services to automatically translate missing/outdated keys
  • Should be manually triggered because:
    • Takes 10-15 minutes to process all locales
    • Consumes external API credits
    • Commits the translated files back to the branch
  • Requires the linter's artifacts to know what to translate

Current Workaround (Three Jobs)

We're forced to use a three-job pattern:

lint-translations:
  allow_failure: true # Don't block downstream jobs
  script:
    - run-translation-linter --noMissingTranslations=error
  artifacts:
    when: always
    paths:
      - .temp-translations/*.json

lint-translations-failure-checker:
  needs:
    - job: lint-translations
      artifacts: true
      optional: true
  script:
    -  # Read artifacts and fail hard if issues found
    - exit 1

ai-translate:
  when: manual
  needs:
    - job: lint-translations
      artifacts: true
      optional: true
  script:
    -  # Use artifacts to translate missing keys

Why this is suboptimal:

  • Adds complexity (third job that just reads artifacts and fails)
  • Less intuitive for developers
  • Duplicates failure logic that already exists in the linter

Desired Behavior (Two Jobs)

lint-translations:
  # Fails hard - no allow_failure
  script:
    - run-translation-linter --noMissingTranslations=error
  artifacts:
    when: always
    paths:
      - .temp-translations/*.json

ai-translate:
  when: manual
  needs:
    - job: lint-translations
      artifacts: true
      allow_failure: true # NEW: Allow dependency to fail
  script:
    -  # Use artifacts to translate missing keys

Or via a new keyword:

ai-translate:
  when: manual
  dependencies: # Artifacts-only dependency
    - lint-translations
  # No "needs" = no execution dependency

Related Issues


Benefits

This would enable common patterns like:

  • Linting with automatic remediation options
  • Test failures with manual retry/debug jobs
  • Security scans with manual override workflows
  • Any "fail-fast, fix-manually" pattern

Additional Context

This is a frequently requested pattern in the community, and teams currently work around it with either:

  1. Three-job patterns (our approach)
  2. Dynamic child pipelines (overly complex)
  3. Accepting allow_failure: true on the validation job (doesn't block merges)

None of these workarounds are ideal for the use case.


Visual Representation

Current Workaround (Three Jobs)

┌─────────────────────────────────────┐
│  Translation Linter                 │
│  - Detects missing/outdated         │
│  - allow_failure: true              │
│  - Produces artifacts ✓             │
└─────────────┬───────────────────────┘

              ├──────────────────────────┐
              │                          │
              ▼                          ▼
┌─────────────────────────┐  ┌──────────────────────────┐
│  Failure Checker        │  │  AI Translation (manual) │
│  - Reads artifacts      │  │  - Reads artifacts       │
│  - Fails hard ❌        │  │  - Translates with AI    │
│  - Blocks pipeline      │  │  - Commits fixes         │
└─────────────────────────┘  └──────────────────────────┘

How it works:

  • Linter fails but allows pipeline to continue
  • Failure checker fails hard to block merges
  • AI job stays available for manual trigger

Desired Solution (Two Jobs)

┌─────────────────────────────────────┐
│  Translation Linter                 │
│  - Detects missing/outdated         │
│  - Fails hard ❌                    │
│  - Blocks pipeline                  │
│  - Produces artifacts ✓             │
└─────────────┬───────────────────────┘

              │ (artifacts flow but no execution dependency)


┌──────────────────────────────────────┐
│  AI Translation (manual)             │
│  - Still runnable despite failure ⚠️ │
│  - Reads artifacts                   │
│  - Translates with AI                │
│  - Commits fixes                     │
└──────────────────────────────────────┘

Why it doesn't work currently:

  • GitLab requires needs when using dependencies
  • If linter fails → AI job gets skipped
  • No way to have "artifacts only" dependency without execution dependency

The gap: GitLab doesn't support accessing artifacts from failed jobs without creating an execution dependency that causes downstream jobs to be skipped.

Edited by 🤖 GitLab Bot 🤖