Skip to content

Adding runner job failed/success metrics with more details

ella requested to merge ellax/gitlab-runner:master into main

What does this MR do?

This MR adds new prometheus metrics for runner succeeded jobs.

The additional success metric example: 

# HELP gitlab_runner_succeeded_jobs_total Total number of succeeded jobs
# TYPE gitlab_runner_succeeded_jobs_total counter
gitlab_runner_succeeded_jobs_total{job_result="success",runner="7ywyWRnr"} 2

and modified the failed job metric to:

# HELP gitlab_runner_failed_jobs_total Total number of failed jobs
# TYPE gitlab_runner_failed_jobs_total counter
gitlab_runner_failed_jobs_total{job_result="script_failure",runner="7ywyWRnr"} 2

Why was this MR needed?

The existing metric only records failed jobs per runners; example: 

ci_runner_failed_jobs_total{failure_reason="script_failure",runner="9e42ca"} 1 .  

It would be also helpful to see the job succeed metric to track job success/failure rate.

Are there points in the code the reviewer needs to double check?

on helpers/prometheus/job_status_collector.go file, line 12 []string{"runner", "job_result"},. gitlab_runner_failed_jobs_total 's label is changed to job_result from failure_reason to match gitlab_runner_succeeded_jobs_total. Just want to point it out as we are not sure if this change will affect/break anything?

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Tests
    • Added for this feature/bug
    • All builds are passing
  • Branch has no merge conflicts with master (if you do - rebase it please)

What are the relevant issue numbers?

Edited by 🤖 GitLab Bot 🤖

Merge request reports