Fix job state transition hooks not taken into account when using two-phase commit

Based on the findings in note 2905078452, the allow_runner_job_acknowledgement feature flag is causing job timeout metadata to be unavailable or incorrectly set to 0s when jobs are in the pending state during Phase 1 of the two-phase commit workflow.

Impact: Kubernetes executor jobs using FF_USE_POD_ACTIVE_DEADLINE_SECONDS fail immediately with runner_system_failure because activeDeadlineSeconds is set to 0s instead of the configured job timeout.

Required Fixes

1. Ensure Complete Job Metadata in Pending State

Problem: Job timeout (and potentially other metadata) is not fully populated in the job payload when the job is assigned to a runner in pending state.

Fix: Modify the job assignment logic to ensure all critical job metadata is included in the response to POST /api/v4/jobs/request even when the job remains in pending state.

Files to investigate:

lib/api/ci/runner.rb - Job request endpoint
app/services/ci/register_job_service.rb - Job assignment service
Job serializer used for runner API responses

2. Add Validation for Job Payload Completeness

Add validation to ensure the job payload sent to runners includes:

✅ Job timeout (timeout)
✅ Resource limits
✅ All required variables
✅ Any other metadata runners need during preparation phase

3. Add Test Coverage

Required tests:

Integration test: Kubernetes executor with FF_USE_POD_ACTIVE_DEADLINE_SECONDS + two-phase commit
Unit test: Job payload includes timeout when job is in pending state
E2E test: Verify activeDeadlineSeconds is set correctly for jobs using two-phase commit

4. Investigate Other Potential Metadata Issues

Review whether other job attributes might have similar issues:

Resource limits (CPU, memory)
Service containers configuration
Cache/artifact settings
Custom variables that might be lazily loaded

Verification Steps Before Next Rollout

Before re-enabling the feature flag:

Confirm job timeout is present in API response for pending state jobs
Test with Kubernetes executor using FF_USE_POD_ACTIVE_DEADLINE_SECONDS
Verify activeDeadlineSeconds is set to correct timeout value (not 0s)
Monitor for runner_system_failure errors during staged rollout /copy_metadata #578881

Edited Nov 24, 2025 by 🤖 GitLab Bot 🤖