Job deduplication fails when arguments don't round-trip via JSON
I spent some time with @alexkalderimis today debugging an issue where jobs were constantly being marked as duplicates, even though the original job had finished processing.
After many dead ends, we tracked this down to the job's idempotency string: https://gitlab.com/gitlab-org/gitlab/-/blob/0fe97b2a9b0b696512a4e3c4689a410614bd514d/lib/gitlab/sidekiq_middleware/duplicate_jobs/duplicate_job.rb#L132 This takes the arguments (which, as we're in a middleware, are Ruby objects) and encodes them into a string, which is then hashed. This is performed in client middleware (to set the idempotency key) and in the server middleware (to delete it once the job is finished).
The problem here was with the values in the arguments. Our arguments looked like this: [29, 1, [], { execute_hooks: true }]
. The last argument is a hash with symbol keys. Sidekiq will process the job perfectly fine - it will make the keys strings, but as long as the worker can handle it that's not a problem. But it does mean that the client and server middleware will see different idempotency keys:
# Client - generates key A
[29, 1, [], { execute_hooks: true }].join('-')
#=> "29-1--{:execute_hooks=>true}"
# Server - generates key B
[29, 1, [], Sidekiq.load_json(Sidekiq.dump_json({ execute_hooks: true }))].join('-')
#=> "29-1--{\"execute_hooks\"=>true}"
This means that the client will be writing key A, but the server will try to delete key B. Key A will never be deleted and all subsequent jobs will be incorrectly marked as duplicates.
I think the simplest fix would be to do:
def idempotency_string
"#{worker_class_name}:#{Sidekiq.dump_json(arguments)}"
end
As this would be consistent on both sides.
We need to be careful with the rollout:
- The existing format will still be in Redis, so we need to handle that.
- Canary will schedule jobs that are handled by the main stage, which is running a different version.
Status update 2021-06-02
The first step (gitlab-org/gitlab!62842 (merged)) has been merged. This will be adding the idempotency string used to recognize duplicate jobs to the job payload, ensuring the value is the same both on the server-side and the client-side.
For good measure, we can change the idempotency_string
to use Sidekiq.dump_json
instead of it's own String#join
implementation in 14.1.