Skip to content

Expose `ImportFailure`s and `correlation_id`s in REST API & consume in performance jobs

Overview

A while ago we introduced ImportFailure, which we record in the database when a single relation fails, without halting the entire import. We also log retried queries here that might have succeeded in a second or third attempt. However, since none of them fail the entire import, it can therefore happen that many individual relations fail to be imported without us knowing about it. This is especially important since we are now tracking timing data in import-export-performance, but we don't know to what extent an import/export actually succeeded ("fast" could mean that all relations were skipped.)

Proposal

I suggest to:

  • extend the import status endpoint to return:
    • the number of relation "hard failures" (i.e. ImportFailures that were not recovered from)
    • the correlation_id of the sidekiq job
  • consume the new fields in the import performance pipeline and include them in log output & Slack reports

Note that we cannot currently do these things for exports, since exports have no meta data stored currently (there is no ProjectExportState and no ProjectExportFailures either.)

We need to include this data even if status == finished, because as mentioned above an import can finish successfully but encounter many failed relations while importing.

Edited by 🤖 GitLab Bot 🤖