GithubImporter: Refactor the Representation layer

Problem to solve

If we compare the github importer with an ETL architecture, like GitLab Migration (BulkImports) the representation layer would be the T, the layer responsible for transform data from the source to be used in the Loaders (layer that saved the data). But, currently, the transformation is being leaked to the Loaders layer, example. That's happening because the representation objects receives too little context of the importer, for instance it doesn't have access to what project is being imported which is required to build some data.

Representations

Gitlab::GithubImport::Representation::DiffNote
Gitlab::GithubImport::Representation::Issue
Gitlab::GithubImport::Representation::LfsObject
Gitlab::GithubImport::Representation::Note
Gitlab::GithubImport::Representation::PullRequest
Gitlab::GithubImport::Representation::PullRequestReview
Gitlab::GithubImport::Representation::User

Proposal

optional Rename the Represenation classes/namespace to Transformers to express better its intent and use a similar vocabulary to the GitLab Migration (BulkImports);
Pass more context to the Transformers, like the project being imported and the client being used. Similar to the BulkImports, a Context class could be created to hold this information; (!72429 (closed))
Remove some of the duplication among the Transformers by adding either a super class or a mixing with the shared behavior;
- Define a clear public API for all Transformers, something like #transform (again following what's being used in BulkImports

Expected results

Better maintainability due to
- Simplified Representation/Transformation layer
- Simplified Loader/Saving layer

Edited Nov 04, 2021 by Kassio Borges

Assignee

Select assignees

Time tracking