Skip to content

Draft: Add more GitLab CI/CD jobs metrics [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Tomasz Maczukin requested to merge add-more-gitlab-ci-job-metrics into master

What does this MR do?

This MR aims to replace the CI Jobs (created, pending, running + new: finished in different ways) metrics that are present in GitLab Exporter. The metrics in GitLab Exporter are generated by querying DB, which is very "heavy" and in big instances like GitLab.com it is not reliable anymore. What's even worse - it causes DB problems affecting the rest of the system.

To make the metrics more reliable and less problematic, we've decided to try to move metrics to Rails, using Gitlab::Metrics and to switch from gauges to counters.

The metrics introduced here are mostly following in their information (read: Prometheus labels) what we can see in metrics exported by GitLab Exporter. The most important changes are:

  • event (status transition) based counters instead of SQL-query based gauges ( we'll need to update our Grafana Dashboard and alerts)
  • no namespace label; it's basically impossible to replicate the behavior with new approach ( we'll need to remove some panels from our Grafana Dashboard)
  • few labels describing job source were replaced with source label, which should be auto-updated in the future ( we'll need to update Grafana Dashboard)

An example of metrics generated by this MR after few jobs being executed:

# HELP gitlab_ci_created_jobs_total Multiprocess metric
# TYPE gitlab_ci_created_jobs_total counter
gitlab_ci_created_jobs_total{source="web",project_replica="source",runner_mode="shared_enabled"} 2
# HELP gitlab_ci_entered_pending_jobs_total Multiprocess metric
# TYPE gitlab_ci_entered_pending_jobs_total counter
gitlab_ci_entered_pending_jobs_total{source="web",project_replica="source",runner_mode="shared_enabled"} 2
# HELP gitlab_ci_entered_running_jobs_total Multiprocess metric
# TYPE gitlab_ci_entered_running_jobs_total counter
gitlab_ci_entered_running_jobs_total{source="web",project_replica="source",runner_type="instance_type"} 2
# HELP gitlab_ci_finished_jobs_total Multiprocess metric
# TYPE gitlab_ci_finished_jobs_total counter
gitlab_ci_finished_jobs_total{source="web",project_replica="source",runner_type="instance_type",status="failed",failure_reason="script_failure"} 1
gitlab_ci_finished_jobs_total{source="web",project_replica="source",runner_type="instance_type",status="success",failure_reason="none"} 1

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

References #290751

Edited by Elliot Rushton

Merge request reports

Loading