Skip to content

Draft: GithubImporter: Refactoring representation layer (DiffNote)

What does this MR do and why?

This is the first part of a bigger plan to improve the GithubImporter as a whole by firstly moving all the objects transformation/formatting to the Representation layer.

Context

The GitHub importer uses a variation of the ETL architecture, where:

  • Extractions happens on Importers (with pluralized resource names);
  • Transformations happens in Representation layer;
  • Loading happens in Importers (with singular resource names);

Currently some transformations are happening in multiple places:

  • Representation.from_api_response;
  • Other Represenation methods;
  • Within the Importers;
  • Within the finders;

Reasons behind this change

With the end goal of changing all importers to use the same architecture, to improve maintainability and unify observability, a new importer is being built for GitLab Migration. This new importer uses ETL architecture on its core. For this reason, bringing the Github Importer code to a more ETL format will have the short-term advantage to improve the maintainability and the long-term value to make the migration to the BulkImports architecture/namespace easier.

Work in this commit

With the iteration 👣 value in mind, this MVC refactors only the ::Gitlab::GithubImport::Representation::DiffNote as an example of what can be achieved with this refactoring.

  • As previously mentioned, the goal here is move as much Transformation work to the Representation Layer as possible. For this, the Representation object requires more context of the current importer, like the Project being imported and the Github client.
  • Make it more explicit what attributes are used from the Github Response
  • Make it more explicit what attributes are used to create the DiffNote in GitLab
  • Simplify the tests - now that the object has a single entry point, we don't need to replicate the tests for from_api_response and from_json_hash

Related to: #330331

Screenshots or screen recordings

Current architecture overview
sequenceDiagram
    participant GithubAPI
    participant Stage
    participant Representation
    participant ObjectImporter

    Stage ->> GithubAPI: Fetch Collection
    activate GithubAPI
    GithubAPI ->> Stage: Collection of objects
    deactivate GithubAPI

    loop every object
        Stage ->> Representation: from_api_response (serialize)
        activate Representation
        Representation ->> Stage: serialized object
        deactivate Representation

        Stage ->> ObjectImporter: execute (serialized object)
        
        ObjectImporter ->> Representation: from_json_hash
        activate Representation
        Representation ->> ObjectImporter: deserialized object
        deactivate Representation
        
        Note right of ObjectImporter: At this point<br>the ObjectImporter<br>uses the deserialized object and some<br>transformations from the Representation<br>to build the attributes (more transformations) to<br>save the object on Gitlab
    end

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kassio Borges

Merge request reports