Remove `ElasticIndexerWorker` changed_fields argument
Summary
When elasticsearch integration is enabled, every create, update, or delete action for an indexed resource (issue, say, or note) enqueues an ElasticIndexerWorker
sidekiq job. One of the arguments to that job is the list of all changed fields for the resource. This is serialized as a JSON hash, and can be quite long. It also gets in the way of calculating whether the job is identical to another, previous job for the same resource for deduplication purposes: gitlab-com/gl-infra/scalability#42 (comment 280297040)
Improvements
We should be able to remove the changed_fields
attribute entirely.
When we push a new document to elasticsearch, we always generate and push a complete document - we never perform a partial update.
The changed_fields
key is used for two things:
- Determine if an elasticsearch-tracked field has changed
- Determine whether subresources also need to be updated
The first can be calculated prior to pushing the job to sidekiq. The second can be a simple boolean instead, again calculated upfront.
Naively, this could halve the size of each ElasticIndexerWorker
job being held in Redis, which is valuable because we generate so many of them at present. It will also mean we push fewer jobs overall, since we're determining whether we need to do an elasticsearch update upfront.
Risks
Changing arguments for a sidekiq job can be tricky. Fortunately, changed_fields
is a member of the opts
hash, which is passed through as serialised JSON, and its position is not changing. So this should be a safe change.
We move a tiny amount of work from sidekiq to unicorn/puma - nothing to be worried about.
Open question: What else is in the opts
hash? Is this everything we need to get to deduplicate-able jobs, or just part of it?