Make structured "job finished" log line with failure_reason and exit_code (!5885) · Merge requests · GitLab.org / gitlab-runner

What does this MR do?

Changes the job finished message to be unified, and includes metadata that is currently not being logged.

Before:

Job succeeded                                       duration_s=35.836495667 gitlab_user_id=1 job=526 namespace_id=0 organization_id=0 project=19 project_full_path=root/gdk-ci-test root_namespace_id=0 runner=Wg8IWvTxZ runner_name=GDK local runner

ERROR: Job failed (system failure): prepare environment: setting up build pod: provided host alias mysql_1 for (...) duration_s=1.337469125 gitlab_user_id=1 job=529 namespace_id=1 organization_id=1 project=20 project_full_path=root/tm-services-compatibility-between-docker-and-k8s root_namespace_id=1 runner=Wg8IWvTxZ runner_name=GDK local runner

After:

Job succeeded                                        duration_s=36.388285542 gitlab_user_id=1 job=539 job-status=success namespace_id=1 organization_id=1 project=19 project_full_path=root/gdk-ci-test root_namespace_id=1 runner=Wg8IWvTxZ

WARNING: Job failed: command terminated with exit code 42  duration_s=9.154000584 error=command terminated with exit code 42 exit_code=42 failure_reason= gitlab_user_id=1 job=553 job-status=failed namespace_id=1 organization_id=1 project=19 project_full_path=root/gdk-ci-test root_namespace_id=1 runner=Wg8IWvTxZ

Why was this MR needed?

This allows us to more easily see job success vs failure rates. It also tells us more about why a job failed.

This is particularly useful during incidents, where we may want to see if there is an increase in system failures. Currently that requires fuzzy matching on the msg field which is not very user friendly.

The end goal of this is for job finished to canonically represent all job completions, include all relevant dimensions, allowing us to easily assess the overall health of the system and dig into systemic failures.

EDIT: We keep the old messages for BC, but we can use job-status: [success, failed] as a filter.

What's the best way to test this MR?

I tested it locally.

What are the relevant issue numbers?

Edited Oct 15, 2025 by Igor

Make structured "job finished" log line with failure_reason and exit_code

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports