Skip to content

Improve observability of the Enqueuer worker

David Fernandez requested to merge 356042-improve-enqueuer-observability into master

Context

We're currently implement a data migration on the Container Registry. This migration is going to be driven by the rails backend.

At the core of the rails part lies the Enqueuer worker. Its responsibility is: find the next eligible image repository to migrate and call the container registry to start/retry the migration.

The migration orchestration (rails and container registry) is gated behind a feature flag and we're currently testing the whole process on staging.

Preliminary tests revealed that the Enqueuer job doesn't log enough information. We need more information on the done message:

  • when a guard is triggered
  • when an error occurs
  • when the picked container repository fails additional checks.

This is issue #356042 (closed).

🤔 What does this MR do and why?

  • Add more logs in the Enqueuer job
  • Push down error handling so that we have a more precise message
  • Update the related specs

No changelog added because as stated in the Context above, this worker is gated behind multiple feature flag and for now, it is only enabled on demand on staging when we test the migration.

🖼 Screenshots or screen recordings

n / a

🎬 How to set up and validate locally

Follow !78613 (merged) and you should see the additional logs in the background jobs logs.

Examples:

  • With feature flag container_registry_migration_phase2_enabled disabled, we get:
      "extra.container_registry_migration_enqueuer_worker.migration_enabled": false
  • With no capacity, we get:
      "extra.container_registry_migration_enqueuer_worker.max_capacity_setting": 0,
      "extra.container_registry_migration_enqueuer_worker.below_capacity": false,
  • Handling the next repository, we get:
      "extra.container_registry_migration_enqueuer_worker.import_type": "next",
      "extra.container_registry_migration_enqueuer_worker.container_repository_id": 18,
      "extra.container_registry_migration_enqueuer_worker.container_repository_path": "gitlab-org/gitlab-test/test_image_11",
  • Execution that is triggered too soon, we get:
      "extra.container_registry_migration_enqueuer_worker.waiting_time_passed": false,
      "extra.container_registry_migration_enqueuer_worker.current_waiting_time_setting": 3600,

💈 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports