In case when Direct transfer completed, but not all records were imported, the user need is to identify the records that failed, and understand why they failed to be imported. This not available yet.
Proposal
Create new failures API endpoint so that it returns enough information for user to understand what records failed to import and why.
Frontend can use this API endpoint to show information in UI, issue.
Any API caller (with permissions) can get info from this endpoint as well.
Save, for each failed record:
What kind of record it is (e.g. merge request, issue, etc.)
The ID of the record on the source (e.g. the merge request ID, issue ID, etc.) - we don't export this, see comment
A "human readable" identifier of the record, if present (e.g. the merge request title, the issue title, etc.)
What kind of error happened and caused the record to not be imported
The correlation ID for debugging purposes
(Nice to have) URL of the record - should be possible to recreate if a record has iid. Not all URLs can be recreated ( e.g. protected branches).
@m_frankiewicz the API endpoint for this already exists, it's the failures endpoint that lists which records failed to be imported, we just need to extend it
For the GitHub import details page, we make a request to `` and this is how the response looks like (mock data):
{type:'pull_request',title:'Add one cool feature',provider_url:'https://github.com/USER/REPO/pull/2',details:{exception_class:'ActiveRecord::RecordInvalid',exception_message:'Record invalid',source:'Gitlab::GithubImport::Importer::PullRequestImporter',github_identifiers:{iid:2,issuable_type:'MergeRequest',object_type:'pull_request',},},}
Based on this, we are able to show UI that looks like this:
Ideally we should have a similar response, where details can be any sort of object and contain all other information.
@georgekoltsov I think the question is about the presentation of the data.
As @justin_ho wrote, as the result, we want to show a separate page, that would contain the information of objects that didn't get migrated. At the moment we're showing errors when clicking on Details button on the import history page, but it's shown on the same page. We want a separate page, so that we can link to it from import history page, but also from page that lists groups available for import.
The layout of the page will more or less be as Justin showed:
.
I think the columns in the table can be little different. Mapping of information from the endpoint to table headers:
What kind of record it is (e.g. merge request, issue, etc.) - maps to Type
The ID of the record on the source (e.g. the merge request ID, issue ID, etc.) - ID on source
A "human readable" identifier of the record, if present (e.g. the merge request title, the issue title, etc.) - Title
What kind of error happened and caused the record to not be imported - Error
The correlation ID for debugging purposes - Correlation ID
(Nice to have) URL of the record - should be possible to recreate if a record has iid. Not all URLs can be recreated ( e.g. protected branches). - Maybe it doesn't need a column? If there's a link to an item available, the Title field could contained a link (is it possible Justin?). Unless there's a reason to have Title and URL separately.
When it comes to order of columns I think:
Type, Title, ID on source, Error, Correlation ID.
I'm thinking, do we need a line of text on the top of the page, saying something like: "Listed below are items that didn't get imported from the source instance. You can identify them by Title and ID on source fields. Where possible, the Title is linked to the item on the source instance. Correlation IDs are needed if you want to request support from GitLab."
@eread could you review proposed column titles and the text? Do you think the text is necessary/helpful?
@justin_ho WDYT about proposal? It would be nit different than GH imported results, but I don't see it as problem. And we can iterate to get them to look the same.
Ok, understood. Right now the list of failures is included under the entity API endpoint (e.g. /api/v4/bulk_imports/123/entities/456)
So you're suggesting to have a new, separate endpoint /api/v4/bulk_imports/123/entities/456/failures that returns just the errors, without the entity information?
@georgekoltsov yes, I think that a separate endpoint would be best. One that can be consumed by FE to show information in UI, and also used by any other API caller (with permissions).
This endpoint should return all records that failed to be imported for a particular group or project that is listed on the https://gitlab.com/import/bulk_imports/history page.
Now I have a question - when a group is listed on https://gitlab.com/import/bulk_imports/history page as partially imported, does the Details button contain errors for group only or for group with all it's projects?
I wrote before that after clicking Details button next to partially imported or failed group or project listed on https://gitlab.com/import/bulk_imports/history page, user should see a new page listing all the records that didn't get imported. And that user should be able to go to this page with not imported records also from page that lists groups available for import, https://gitlab.com/import/bulk_imports/status. However now I'm unsure, does https://gitlab.com/import/bulk_imports/status show Details button for partially imported groups?
Apologies, I didn't noticed that problems/questions before. Should we sync on Monday with @justin_ho as well to clarify all the questions here?
@georgekoltsov@justin_ho I'm scheduling call for us for tomorrow. Justin, please review this and add your comments/questions.
In the meantime, I'm taking step back on what I wrote:
This endpoint should return all records that failed to be imported for a particular group or project that is listed on the https://gitlab.com/import/bulk_imports/history page.
I think the endpoint should return errors for a particular import process, which can include multiple groups and projects. The API response can be nested.
When it comes to order of columns I think: Type, Title, ID on source, Error, Correlation ID.
@eread could you review proposed column titles and the text? Do you think the text is necessary/helpful?
Thanks for the ping! The titles look pretty good! The only two that make me think twice are:
ID on source: Given the record failed to import, there's no "destination ID" in this context. I wonder if ID would be enough here? I'd read that with the other columns as "Record of type Type with ID ID failed to import with error Error". That said, the long version won't hurt.
Correlation ID: I wonder if that would be more simply put as Debugging ID or Log ID? Ideally, it'd be self-explanatory what to use that for.
For transparency here is the link (internal) to the raw meeting notes from today's meeting. Overall we are on the same page in terms of what needs to be done and how the frontend - backend will interact.
@georgekoltsov is there a reason for not exporting object IDs or it just haven't been done?
Source object ID was suppose to help understand which object was not imported, especially in case where no URL link can be re-created. So without it some objects would not be "recognizable" - do I get it correctly?
@m_frankiewicz it's mainly for security reasons and historically has been this way in order to not allow potential id overrides when importing. Apart from not exporting them, we also filter them out on import
@justin_ho we have source_full_path of the entity we import (group/project) & iid of the record (issue/mr/epic) which is enough to construct the URL. So it's constructed when iid is present.
@georgekoltsov Awesome work on getting the MR for this merged so quickly! I have started working on integrating the frontend with the backend and have a small question. Do you happen to have a list of possible values for relation? I have collated the basic ones like issues, merge_requests from the examples but want to make sure we cover all scenarios.
@justin_ho The list of relations is usually things that are listed here for groups and here for projects, but the 'top' level things. Like issues, ci_pipelines, epics, etc. Anything under those entries are 'sub-relations'. Does that make sense?