Skip to content

Lessons learned from CI decomposition

Identify lessons learned during the CI decomposition work and how we can influence the architecture design and engineering practices going forward.

Discuss in this issue and collect ideas below.

Lessons learned

  1. Interface Segregation Principle - create single purposed tables instead of adding columns to existing large models
    • Example projects table could contain only 10 out of 83 columns. The remaining columns are linked to other domains settings (merge request settings, mirroring, ci/cd settings, etc.)
    • Prior to the CI decomposition work we started moving the CI minutes usage tracking from namespace_statistics and project_statistics into ci_namespace_monthly_usages and ci_project_monthly_usages tables respectively. This allowed us to query and update these tables in isolation.
  2. Foreign keys vs Application code - the more FKs we have, the more we need to convert to LFK or to rethink data integrity. FKs would be a good fit for tightly coupled records. In some scenarios we don't need FKs.
    • In the case of the ci_namespace_monthly_usages and ci_project_monthly_usages we purposely did not add a (loose) foreign key via namespace_id or project_id. This allowed us to maintain historical tracking of CI minutes usage even after a project is deleted. We are able to show that a deleted project consumed X minutes, rather than having the records deleted via foreign keys.
    • I think we need to define guidelines on when to use FK, when LFK, and when not to use foreign keys at all. FKs are a simple way to maintain integrity of the data but in complex scenarios, like a project removal, we would want instead to control the record deletion via application logic (e.g. trigger async artifacts removal from object storage when records are deleted).
  3. 2-ways AR relations - When analyzing cross-database table usages we noticed that having 2-ways AR relations (belongs_to <-> has_many) increased the difficulty of the analysis since we needed to check more access patterns. We define 2-ways AR relations almost as default today, but do we need that? Can we define relationships based on the needed access pattern?
  4. Managing side-effects - some of the changes we needed to make were related to decoupling side-effects (e.g. updates in a different database) from the main transaction.
    • Decoupling side-effects from a business transaction (not necessarily database transaction) is a good Domain-Driven Design practice which helps with decoupling components.
    • Gitlab::EventStore could be one of the tools we could use in this case. For example: when a ProjectDeletedEvent is published, other domains (such as Ci::) could react async.
  5. more...
  6. more...
  7. more...

Action items

  • action item 1
  • action item 2
Edited by Fabio Pitino