GithubImporter: Refactor the Representation layer
Problem to solve
If we compare the github importer with an ETL architecture, like GitLab Migration (BulkImports
) the representation layer would be the T
, the layer responsible for transform data from the source to be used in the Loaders (layer that saved the data). But, currently, the transformation is being leaked to the Loaders
layer, example. That's happening because the representation objects receives too little context of the importer, for instance it doesn't have access to what project is being imported which is required to build some data.
Representations
-
Gitlab::GithubImport::Representation::DiffNote -
Gitlab::GithubImport::Representation::Issue -
Gitlab::GithubImport::Representation::LfsObject -
Gitlab::GithubImport::Representation::Note -
Gitlab::GithubImport::Representation::PullRequest -
Gitlab::GithubImport::Representation::PullRequestReview -
Gitlab::GithubImport::Representation::User
Proposal
-
optional Rename the Represenation
classes/namespace toTransformers
to express better its intent and use a similar vocabulary to the GitLab Migration (BulkImports
); -
Pass more context to the Transformers, like the project
being imported and theclient
being used. Similar to theBulkImports
, aContext
class could be created to hold this information; (!72429 (closed)) -
Remove some of the duplication among the Transformers
by adding either a super class or a mixing with the shared behavior;-
Define a clear public API for all Transformers
, something like#transform
(again following what's being used inBulkImports
-
Expected results
- Better maintainability due to
- Simplified Representation/Transformation layer
- Simplified Loader/Saving layer
Edited by Kassio Borges