Skip to content

When setting sidekiq.metrics.enabled == false, sidekiq doesn't start up

Summary

When metrics are disabled on the sidekiq pods, this prevents the liveness and readiness probes from properly working. This is due to the fact that we disable the port from being exposed in the deployment when the configuration is set to false: https://gitlab.com/gitlab-org/charts/gitlab/-/blob/4fc17f462001da727b9d1a577053dbfcaac3fd72/charts/gitlab/charts/sidekiq/templates/deployment.yaml#L163

Without that port definition our probes have nothing to properly validate the state of the Pod: https://gitlab.com/gitlab-org/charts/gitlab/-/blob/4fc17f462001da727b9d1a577053dbfcaac3fd72/charts/gitlab/charts/sidekiq/templates/deployment.yaml#L227-230

Steps to reproduce

Disable metrics on the sidekiq deployment, watch sidekiq sit in a crashLoop.

Configuration used

gitlab:
  sidekiq:
    metrics:
      enabled: false

Current behavior

Sidekiq never settles into a Ready state.

NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
default       a-sidekiq-all-in-1-v1-76846bcf96-sf8gj   0/1     Running     1          14m

Expected behavior

Sidekiq should work.

Workaround

  • Don't disable the metrics for sidekiq.

Questions

  • Since we allow the disablement of metrics on the sidekiq deployment, what should we do with readiness and liveness probes?
  • SHOULD we allow for the disablement of metrics for sidekiq?
  • Are there other services where this configuration style is present that we've not yet come across?
  • Does disabling the deployment of Prometheus also disable the metrics endpoint?

Relevant logs

0s          Warning   Unhealthy                      pod/a-sidekiq-all-in-1-v1-649bff8584-xjjv8          Readiness probe failed: Get http://172.17.0.8:3807/readiness: dial tcp 172.17.0.8:3807: connect: connection refused
0s          Warning   Unhealthy                      pod/a-sidekiq-all-in-1-v1-649bff8584-xjjv8          Liveness probe failed: Get http://172.17.0.8:3807/liveness: dial tcp 172.17.0.8:3807: connect: connection refused

Reference:

/cc @ricardofbarros /cc @WarheadsSE

Edited by Jason Plum