SidekiqStatus should not report a failed Sidekiq job as running
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
When an exception is raised in a Sidekiq job, the job's SidekiqStatus becomes invalid until expiration (default 30 min).
Steps to reproduce
- Modify a job to raise an exception.
- Enqueue the job and note its job ID
- Wait a few seconds for the job to run and error.
- Observe that
SidekiqStatus.running?(job_id)returnstrue(which is incorrect since it already died)
What is the current bug behavior?
When an exception is raised in a Sidekiq job, the job's SidekiqStatus becomes invalid until expiration (default 30 min).
This exacerbated another problem here: https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/issues/8002#note_2386385834
What is the expected correct behavior?
When an exception is raised in a Sidekiq job, the job's SidekiqStatus becomes immediately unset (if possible).
Possible fixes
When an exception is raised in a Sidekiq job, we should still attempt SidekiqStatus.unset, to avoid orphaning the key in Redis.
This is where it is normally unset https://gitlab.com/gitlab-org/gitlab/-/blob/v17.9.0-ee/lib/gitlab/sidekiq_status/server_middleware.rb#L9