Prevent duplicate issues when ImportIssuesCsvWorker is retried
When ImportIssuesCsvWorker
is retried, it reads the whole CSV again and starts from the beginning. This means the issues that have already been inserted in the previous run will be duplicated.
This worker does not run inside a single DB transaction because these could take a long time and we don't want long-running transactions.
Kubernetes auto-scaling also aggravates the problem because long-running Sidekiq jobs are prone to getting interrupted when auto-scaling events happen.
Proposal
We currently have a csv_issue_imports
table to track imports being made by a user. We could add upload_id
and max_processed_row_number
to track where we're at and continue where we left off. So that our DB transactions would just be for inserting one issue and updating the max number.