Expose `ImportFailure`s and `correlation_id`s in REST API & consume in performance jobs
Overview
A while ago we introduced ImportFailure
, which we record in the database when a single relation fails, without halting the entire import. We also log retried queries here that might have succeeded in a second or third attempt. However, since none of them fail the entire import, it can therefore happen that many individual relations fail to be imported without us knowing about it. This is especially important since we are now tracking timing data in import-export-performance, but we don't know to what extent an import/export actually succeeded ("fast" could mean that all relations were skipped.)
Proposal
I suggest to:
- extend the import status endpoint to return:
- the number of relation "hard failures" (i.e.
ImportFailure
s that were not recovered from) - the
correlation_id
of the sidekiq job
- the number of relation "hard failures" (i.e.
- consume the new fields in the import performance pipeline and include them in log output & Slack reports
Note that we cannot currently do these things for exports, since exports have no meta data stored currently (there is no ProjectExportState
and no ProjectExportFailure
s either.)
We need to include this data even if status == finished
, because as mentioned above an import can finish successfully but encounter many failed relations while importing.