Skip to content

Draft: POC - Use Packwerk to create `components/ci`

Fabio Pitino requested to merge packwerk-poc-3 into master

What does this MR do and why?

Related to Related to #365293 (closed)

This MR provides a more iterative approach than !88899 (closed).

In this MR we attempt to move only Gitlab::Ci::Config code into components/ci and run Packwerk to see the violations.

Overview

This MR attempts to prove that GitLab codebase could be better organized into domain components by following the Domain-Driven Design principles.

Rather than having the codebase organized primarily as a layered architecture (Model-View-Controller), the large monolith could be organized into components representing the functional domains of the application. Within each component (or bounded context) we retain the directory structure of a typical Rails application (models, services, workers, etc.) with some tweaks:

  1. Each component contains a **/app/lib folder where domain logic that doesn't fit the MVC pattern can go. This stops us from putting unorganized domain code into the top-level lib/ folder.
  2. Controller, Frontend and API code remains in the main app/ for now. We could decide to move those into the respective component. Example: each component defines the own REST endpoints and GraphQL types which are ultimately mounted all together.
  3. Database migrations remain in the main db/ directory. Again, we could decide to have each component hosting its own migrations (components/ci/core/db/migrate/*).

This PoC also uses Packwerk to enforce boundaries between components, otherwise there wouldn't be much benefit. Packwerk doesn't give a proper isolation since all Ruby code eventually is loaded in memory. It doesn't stop a component from referencing a constant from another component at runtime. However, it's a powerful static analysis tool that can be used in CI to enforce boundaries between components, helping us to design more decoupled domains.

The directory structure looks like the example below. Some points to highlight:

  1. each component components/<name> directory contains a package.yml containing Packwerk package configs.
  2. Each component directory has a core and ee directory. FOSS code will be located under components/*/core and EE extensions undercomponents/*/ee.
    • core is always autoloaded.
    • ee is loaded conditionally if running EE.
    • This PoC doesn't include jh in the scope but we can use the identical approach as ee.
    • Why this choice? Code inside core and ee that is part of the same package (e.g. Ci::) should be colocated under the same package directory. With Packwerk it's not possible (thankfully) to have packages scattered around multiple directories. This makes all code related to a domain under the same directory.
  3. Each component directory includes the relative specs.
  4. Packwerk provides a way to make certain constants public for other packages/components to use. For example Ci::CreatePipelineService could be used by other domains to trigger a pipeline. Packwerk requires files declaring these constants to be placed into a special public directory: components/ci/public/core/services/ci/create_pipeline_service.rb. Anything outside the public directory is considered private to the component.
components
└── ci
    ├── package.yml
    ├── public
    │   ├── core
    │   │   └── services
    │   │       └── ci/create_pipeline_service.rb
    │   └── ee
    ├── core    # autoload components/*/core by default
    │   ├── app
    │   │   ├── models
    │   │   ├── lib
    │   │   ├── services
    │   │   └── ...
    │   └── spec
    │       ├── models
    │       ├── lib
    │       ├── services
    │       └── ...
    └── ee      # autoload components/*/ee if running EE
        ├── app
        │   ├── models
        │   │   ├── ci/minutes/usage.rb  # class defined in EE
        │   │   └── ee/ci/pipeline.rb    # EE extension for a class in Core
        │   ├── lib
        │   ├── services
        │   └── ...
        └── spec
            ├── models
            ├── lib
            ├── services
            └── ...

Changes made in this MR:

  • I simply moved part of CI files into the directory structure above. In this MR we attempt to move only Gitlab::Ci::Config code into components/ci/{core|ee}/app/lib/gitlab/ci/ and run Packwerk to see the violations.
  • Other Ci:: files are left in the main/outer component (.)
  • Generated packerk.yml and package.yml config files with some tuning.
  • Generated the deprecated_references.yml files which are the equivalent of Rubocop TODO files and record existing violations so that Packwerk don't warn about these again.
  • Changed config/application.rb to autoload files from inside the component and conditionally load EE files.
  • Changed lib/gitlab/sidekiq_config.rb to recognize the new location of the CI workers and any other new components.

Not done:

  • add components/ci/public/core/* and components/ci/public/ee/* to the autoload paths.
  • fix issues with knapsack not running some specs
  • fix Rubocop todo files (violations) since we moved a lot of files around and those are now flagged. Running rails rubocop:todo:generate however creates too many (including unrelated) changes. So this is why rubocop job fails.
  • the static analysis CI job complains about the need to run rails gettext:regenerate to update locale/gitlab.pot. I think there is a configuration somewhere that doesn't recognize the moved files.

Advantages

Code architecture

  • The whole Ci:: namespace is a single package which we can control how it is used.
  • Package dependencies will be explicitly defined in components/ci/package.yml.
  • The package.yml defines also who are the teams primarily responsible for the stewardship of the package.
  • Production code and specs are both located into the same package. This will give developers a much smaller context to work on while also benefiting from the privacy and dependency constraints that Packwerk provides.
  • This way all classes inside Ci:: component can depend on each other. Eventually sub-packages could be created to better define internal boundaries.
  • It's possible to put into the same component files that don't belong to the same Ruby module (e.g. CommitStatus can be moved into components/ci). While this is not ideal, it provides a flexible way to group files that should belong to the same domain. Then we can ensure they both are nested under the same Ruby module.

Iteration

  • This can be done in several iterations. Create an empty component and gradually move all files in it. Then use Packwerk to enforce privacy and dependencies.
  • Code that is clearly defined into a namespace can be converted to a package and moved to the respective folder.
  • We can start seeing the advantages immediately by recording the existing violations for us to resolve later but also be alerted about new violations as we make changes going forward.

Challenges

  • We need to understand how this would play with the proposal of separating out the web_engine. Put controllers/GraphQL/REST/serializers in a web_engine vs in dedicated component/* folder.
  • New directory structure for developers to get used to.
  • Could mess Git history by moving many files - We already experienced that a few years ago when we consolidated Core and EE codebase into a single repo.
  • Many iterations to get to a consistent state across the codebase.
  • Requires adoption from all engineering teams.
  • CODEOWNERS, Danger bot and other tools will need to be updated. Although I believe that rules for ownership will be simplified in a component-based monolith than hard-coding paths as we do today in CODEOWNERS.

Dependency graph

In this MR we are only extracting components/ci and in order to do it incrementally we need to have a cyclical dependency with the main/parent component that is the Rails root directory. This is because Ci:: needs to reference external constants like Project, MergeRequest, etc.

flowchart LR

root[Parent component] -- depends on --> CI
CI -- depends on --> root

Overtime we can end up with:

flowchart LR

CI --> w[Project]
w --> Platform
CI --> Platform
MergeRequests --> CI
MergeRequests --> w
w --> Repository --> Platform
MergeRequests --> Platform
Edited by Fabio Pitino

Merge request reports