Skip to content

Adding runner job failed/success metrics with more details

ella requested to merge (removed):master into master

What does this MR do?

This MR adds new prometheus metrics for runner job success/failure status and provides more details such as project id, project name and job stage. example: ci_runner_job_status_failures{job_result="script_failure",project_id="3010",project_name="workday",stage="deploy"} 1 and ci_runner_job_status_successes{job_result="success",project_id="3010",project_name="workday",stage="lint"} 2

Why was this MR needed?

The existing metric only records failed jobs per runners; example: ci_runner_failed_jobs_total{failure_reason="script_failure",runner="9e42ca"} 1 . With the newly added job status metrics we would be able to track overall job failed rate or more specifically job failed rate per repos or per stages.

Are there points in the code the reviewer needs to double check?

N/A

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Tests
    • Added for this feature/bug
    • All builds are passing
  • Branch has no merge conflicts with master (if you do - rebase it please)

What are the relevant issue numbers?

N/A

Edited by ella

Merge request reports