ConcurrencyLimit::ResumeWorker cron is EE-gated but concurrency limit middleware is not — CE instances permanently deadlock workers
### Summary
`ConcurrencyLimit::ResumeWorker` cron registration is EE-gated (`Gitlab.ee do` block in `config/initializers/1_settings.rb`, line 850), but the concurrency limit middleware and `DEFAULT_CONCURRENCY_LIMIT_PERCENTAGE_BY_URGENCY` (introduced in MR !194881, milestone 18.3) are **not** EE-gated. Any CE instance with `GITLAB_SIDEKIQ_MAX_REPLICAS > 0` and `SIDEKIQ_CONCURRENCY > 0` (both set automatically by the Helm chart) permanently deadlocks workers using `deduplicate :until_executed`.
### GitLab version
18.9.0-ce (Helm chart deployment on Kubernetes, `gitlab-sidekiq-ce:v18.9.0`)
### What is the current bug behavior?
Workers with `deduplicate :until_executed` that exceed their computed concurrency limit are deferred into a Redis throttle queue by `ConcurrencyLimit::Server` middleware. Because `ConcurrencyLimit::ResumeWorker` is never registered as a cron job in CE, these deferred jobs are never resumed. The `until_executed` dedup cookie remains in Redis indefinitely (by design per MR !208142), causing all subsequent enqueue attempts for the same idempotency key to be silently dropped as duplicates.
**Observable symptoms:**
- CI job traces are never flushed from Redis to object storage (`Ci::BuildTraceChunkFlushWorker` deadlocked)
- The CI runner receives HTTP 202 ("accepted, but not yet completed") for 5 minutes on `PUT /api/v4/jobs/:id` until `ACCEPT_TIMEOUT` expires and the trace is discarded
- CI pipelines take ~14 minutes instead of ~3 minutes
- `sidekiq_client.log` shows repeated `job_status: deduplicated` / `deduplication.type: until executed` entries for `Ci::BuildTraceChunkFlushWorker`
- Redis key `sidekiq:concurrency_limit:throttled_jobs:{ci/build_trace_chunk_flush_worker}` accumulates jobs that are never drained
### What is the expected correct behavior?
`ConcurrencyLimit::ResumeWorker` should be registered as a cron job in CE, matching the EE behavior. The cron runs every minute, checks all workers with jobs in throttle queues, and re-enqueues them with `concurrency_limit_resume: true` so the middleware doesn't re-defer them.
Alternatively, if concurrency limiting is not intended for CE, `DEFAULT_CONCURRENCY_LIMIT_PERCENTAGE_BY_URGENCY` and the `ConcurrencyLimit::Server`/`ConcurrencyLimit::Client` middleware should also be gated behind `Gitlab.ee`.
### Steps to reproduce
1. Deploy GitLab CE >= 18.3.0 via Helm chart (which sets `GITLAB_SIDEKIQ_MAX_REPLICAS=2` and `SIDEKIQ_CONCURRENCY=20`)
2. Run any CI pipeline that generates sufficient log output (e.g., phpstan static analysis)
3. Observe that `Ci::BuildTraceChunkFlushWorker` jobs are deferred by `ConcurrencyLimit::Server` into the throttle queue
4. Observe that no `ConcurrencyLimit::ResumeWorker` cron exists:
```ruby
Sidekiq::Cron::Job.all.select { |j| j.name.include?('resume') }
# => [] (empty on CE)
```
5. Observe the runner receiving HTTP 202 for ~5 minutes per job until `ACCEPT_TIMEOUT` fires
### Root cause analysis
In `config/initializers/1_settings.rb`, both `concurrency_limit_resume_worker` (line 953) and `pause_control_resume_worker` (line 950) are inside a `Gitlab.ee do` block (lines 850–1178). On CE instances, `Gitlab.ee?` returns `false`, so these cron jobs are never added to `Settings.cron_jobs` and never registered by `Gitlab::SidekiqConfig::CronJobInitializer.execute`.
However, the following components are NOT EE-gated:
- `DEFAULT_CONCURRENCY_LIMIT_PERCENTAGE_BY_URGENCY` in `app/workers/concerns/worker_attributes.rb` (MR !194881, milestone 18.3)
- `ConcurrencyLimit::Server` and `ConcurrencyLimit::Client` middleware in `lib/gitlab/sidekiq_middleware.rb`
- `get_concurrency_limit` / `calculate_default_limit_from_max_percentage` which compute non-zero limits when `GITLAB_SIDEKIQ_MAX_REPLICAS > 0`
The Helm chart sets `GITLAB_SIDEKIQ_MAX_REPLICAS` to a non-zero value for both CE and EE deployments, activating concurrency limiting on CE without the corresponding drain mechanism.
### Impact
This affects **all 70+ workers** using `deduplicate :until_executed` on any CE instance deployed via Helm chart (or any CE instance where `GITLAB_SIDEKIQ_MAX_REPLICAS > 0`). Most critical affected workers:
| Worker | Urgency | Impact when deadlocked |
|---|---|---|
| `Ci::BuildTraceChunkFlushWorker` | high | CI traces lost, pipelines slow by ~5min/job |
| `PipelineProcessWorker` | high | Pipelines hang permanently |
| `MergeWorker` | high | Merges silently dropped |
| `Ci::CancelPipelineWorker` | high | Cancel button does nothing |
| `Issues::CloseWorker` | high | Issues stay open after MR merge |
| `FlushCounterIncrementsWorker` | low (explicit 50% cap) | Project statistics corruption |
| `Import::ReassignPlaceholderUserRecordsWorker` | low (hardcoded limit=4) | Import migration stalls forever |
| `RunPipelineScheduleWorker` | low | Scheduled pipelines never trigger |
Workers with explicit `concurrency_limit` declarations (e.g., `Import::ReassignPlaceholderUserRecordsWorker` with `concurrency_limit -> { 4 }`) are affected regardless of `GITLAB_SIDEKIQ_MAX_REPLICAS`.
### Workaround
Register the cron jobs manually via `rails runner` on a webservice pod:
```ruby
Sidekiq::Cron::Job.new(
name: 'concurrency_limit_resume_worker',
cron: '*/1 * * * *',
class: 'ConcurrencyLimit::ResumeWorker'
).save
Sidekiq::Cron::Job.new(
name: 'pause_control_resume_worker',
cron: '*/5 * * * *',
class: 'PauseControl::ResumeWorker'
).save
```
These persist across pod restarts because `sidekiq-cron`'s `destroy_removed_jobs` only removes jobs with `source: "schedule"`, and manually created jobs get `source: "dynamic"`.
Alternatively, set `GITLAB_SIDEKIQ_MAX_REPLICAS=0` on the Sidekiq pod to disable default concurrency limiting entirely (does not protect workers with explicit limits).
### Proposed fix
Move the `concurrency_limit_resume_worker` and `pause_control_resume_worker` cron registrations out of the `Gitlab.ee do` block in `config/initializers/1_settings.rb`, placing them alongside the other non-EE cron jobs.
### Relevant merge requests
- MR !194881 (milestone 18.3) — introduced `DEFAULT_CONCURRENCY_LIMIT_PERCENTAGE_BY_URGENCY`, activating default concurrency limits for all workers
- MR !208142 (milestone 18.6) — reordered middleware, making `ResumeWorker` essential for draining deferred jobs
- MR !211908 (milestone 18.6) — removed env var gate, hardcoded the middleware ordering
- MR !174929 (milestone 17.7) — original middleware reorder fix (superseded by !208142)
### Environment
- GitLab CE 18.9.0 via Helm chart on DigitalOcean Kubernetes
- `GITLAB_SIDEKIQ_MAX_REPLICAS=2`, `SIDEKIQ_CONCURRENCY=20`
- Sidekiq: 1 replica, concurrency=20
- Redis: single instance (`gitlab-redis-master-0`)
/cc @marcogreg @schin1
issue