Sidekiq deduplication can leave behind orphaned idempotency keys
When Sidekiq::Client
pushes jobs, it executes the client middleware and then pushes to Redis. If something goes wrong in between, the idempotency key is set but since the job was never pushed, it does not get cleared by the server middleware.
This is also more likely to happen when using push_bulk
because it runs the client middleware for all the jobs and then pushes to Redis with a single LPUSH. Running the middleware for a lot of jobs can take a while so the chances of failure there are higher.
This results in jobs getting incorrectly deduplicated until the key reaches its TTL. The default for this is 6 hours but it is now configurable.
There were several approaches discussed that could mitigate this:
-
Set the idempotency key after the job is pushed
If we check the presence of the key before we push, then set the key after pushing, there could be race conditions where we have duplicate jobs.
I think this may be fine though since it should be fine to have some duplication. Though we may have some jobs right now that expect deduplication to guarantee uniqueness (like
IssueRebalancingWorker
). These would have to be rewritten to use something likeGitlab::ExclusiveLease
within the worker. We already do this for several workers.As the Sidekiq Pro docs says:
Design your jobs so that uniqueness is considered best effort, not a 100% guarantee
This can be tricky to implement though since there's no hook that runs after push.
-
Lower the TTL by default
This is similar to above where uniqueness isn't guaranteed. But instead of a race condition, it happens when the TTL is reached. This could happen now anyway but 6 hours is very long so it is very unlikely.
We could lower it to something like 5 or 10 minutes since I think most jobs should complete by then anyway. Long-running jobs could set a higher TTL since this is configurable per worker anyway.
This one's much easier to implement. We just need to change the value of the constant.
-
Run a reaper process that cleans up orphaned keys
This is the approach
sidekiq-unique-jobs
takes but I think this could be expensive on the Redis side.