Detect duplicate jobs in sidekiq queues
As a first step in #42 (closed) we should detect duplicate jobs in queues when they are scheduled.
To do this, we'd need to add middleware to keep track of a job and it's arguments to avoid having to enumerate the entire queue each time.
- On client-side job creation: (client-middleware)
- Middleware checks whether a job is marked as idempotent, if not it passes through as normal
- The middleware calculates the "idempotency string" for the job. This containings the worker class and the arguments for the job.
- The middleware calculates the "idempotency hash" for the job. This is simply a hash of the "idempotency string" using SHA256 or other.
- The middleware will use this hash as an address of a Redis hash. Using a hash instead of the full idempotency string keeps the keys short while avoiding the (incredibly unlikely, but possible) chance of hash collisions.
- The middleware queuries the Redis hash :
SET gitlab:sidekiq:duplicate:<queue_name>:<idempotency hash> 1
- If the result exists, the middleware marks the job as duplicate
job['duplicate'] = true
so this becomes visible in thelogs - If the result did not exist it sets a TTL on they hash to one day (this is a safety clean-up mechanism which would only be required during incidents):
EXPIRE gitlab:sidekiq:duplicate:<queue_name>:<idempotency hash> 86400
- The middleware now continues as normal
- On the server-side: (server-middleware)
- First three steps are the same as the client side:
- Middleware checks whether a job is marked as idempotent, if not it passes through as normal
- The middleware calculates the "idempotency string" for the job.
- The middleware calculates the "idempotency hash" for the job.
- The middleware removes the Redis hash :
DEL gitlab:sidekiq:duplicate:<queue_name>:<idempotency hash>
- The middleware now continues as normal
This is a slightly modified version of the original proposal in #42 (closed), the changes:
- Use
HSETNX
to set the hash as it only requires 1 redis call instead of checking existence and setting in 2 calls: #42 (comment 230272560) - Don't store the "idempotency string" in the hash, since that entire string would contain the arguments in their entirety, which could mean more stuff that travels back and forth between redis, and we have the same information available in the middleware anyway.
Edited by Bob Van Landuyt