Skip to content

Add FF to stop emitting Sidekiq histogram metrics

Gregorius Marco requested to merge mg-stop-emitting-some-sidekiq-histograms into master

What does this MR do and why?

When the ops FF emit_sidekiq_histogram_metrics is disabled (default enabled), Sidekiq stops emitting these metrics:

  • sidekiq_jobs_completion_seconds
  • sidekiq_jobs_queue_duration_seconds
  • sidekiq_jobs_failed_total

sidekiq_jobs_completion_seconds_sum as a counter will be emitted when the FF is disabled. This sum counter is still used in dashboards.gitlab.net. Context: gitlab-com/runbooks!6096 (comment 1498347517)

The sidekiq_jobs_completion_seconds_sum counter is emitted only when the FF is disabled because having the same metric name from both histogram sidekiq_jobs_completion_seconds (which will implicitly produce the _sum counter) and the raw counter sidekiq_jobs_completion_seconds_sum could crash the Sidekiq application silently (tested locally).

This change is only meant for GitLab.com for now as self-managed might still use these histograms.

Part of an effort to remove some metrics emitted by Sidekiq gitlab-com/gl-infra/scalability#2297 (closed)

How to set up and validate locally

  1. Ensure sidekiq_exporter enabled in gdk.yml:
gitlab:
  rails_background_jobs:
    sidekiq_exporter_enabled: true
  1. With the FF still enabled, we can still see the bucket metrics:
❯ curl -s 'gdk.test:3807/metrics'  | rg sidekiq_jobs_completion_seconds_bucket | head
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="+Inf",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="+Inf",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="10",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="10",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="300",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="done",le="300",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="+Inf",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="+Inf",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="10",queue="default",urgency="low",worker="Llm::CompletionWorker"} 0
sidekiq_jobs_completion_seconds_bucket{boundary="",external_dependencies="no",feature_category="ai_abstraction_layer",job_status="fail",le="10",queue="default",urgency="throttled",worker="Llm::TanukiBot::UpdateWorker"} 0
  1. Disable the FF in Rails console Feature.disable(:emit_sidekiq_histogram_metrics)
  2. Restart sidekiq gdk restart rails-background-jobs
  3. Check only the sidekiq_jobs_completion_seconds_sum exists:
❯ curl -s 'gdk.test:3807/metrics'  | rg sidekiq_jobs_completion_seconds | head
# HELP sidekiq_jobs_completion_seconds_sum Multiprocess metric
# TYPE sidekiq_jobs_completion_seconds_sum counter
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="build_artifacts",queue="default",urgency="low",worker="Projects::RefreshBuildArtifactsSizeStatisticsWorker"} 0.04362599999876693
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="build_artifacts",queue="default",urgency="low",worker="Projects::ScheduleRefreshBuildArtifactsSizeStatisticsWorker"} 0.28387600000132807
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="cell",queue="default",urgency="low",worker="LooseForeignKeys::CleanupWorker"} 0.391459999998915
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="code_review_workflow",queue="default",urgency="low",worker="ScheduleMergeRequestCleanupRefsWorker"} 0.032480000001669396
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="database",queue="default",urgency="low",worker="Database::BatchedBackgroundMigration::CiDatabaseWorker"} 0.43211000000155764
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="database",queue="default",urgency="low",worker="Database::BatchedBackgroundMigrationWorker"} 0.2683130000004894
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="gitaly",queue="default",urgency="low",worker="BatchedGitRefUpdates::CleanupSchedulerWorker"} 0.020981000001484063
sidekiq_jobs_completion_seconds_sum{boundary="",external_dependencies="no",feature_category="global_search",queue="default",urgency="low",worker="ElasticIndexInitialBulkCronWorker"} 0.00874900000053458

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Gregorius Marco

Merge request reports