Lessons learned from CI decomposition
Identify lessons learned during the CI decomposition work and how we can influence the architecture design and engineering practices going forward.
Discuss in this issue and collect ideas below.
Lessons learned
-
Interface Segregation Principle - create single purposed tables instead of adding columns to existing large models
- Example
projects
table could contain only 10 out of 83 columns. The remaining columns are linked to other domains settings (merge request settings, mirroring, ci/cd settings, etc.) - Prior to the CI decomposition work we started moving the CI minutes usage tracking from
namespace_statistics
andproject_statistics
intoci_namespace_monthly_usages
andci_project_monthly_usages
tables respectively. This allowed us to query and update these tables in isolation.
- Example
-
Foreign keys vs Application code - the more FKs we have, the more we need to convert to LFK or to rethink data integrity. FKs would be a good fit for tightly coupled records. In some scenarios we don't need FKs.
- In the case of the
ci_namespace_monthly_usages
andci_project_monthly_usages
we purposely did not add a (loose) foreign key vianamespace_id
orproject_id
. This allowed us to maintain historical tracking of CI minutes usage even after a project is deleted. We are able to show that a deleted project consumed X minutes, rather than having the records deleted via foreign keys. - I think we need to define guidelines on when to use FK, when LFK, and when not to use foreign keys at all. FKs are a simple way to maintain integrity of the data but in complex scenarios, like a project removal, we would want instead to control the record deletion via application logic (e.g. trigger async artifacts removal from object storage when records are deleted).
- In the case of the
-
2-ways AR relations - When analyzing cross-database table usages we noticed that having 2-ways AR relations (
belongs_to <-> has_many
) increased the difficulty of the analysis since we needed to check more access patterns. We define 2-ways AR relations almost as default today, but do we need that? Can we define relationships based on the needed access pattern? -
Managing side-effects - some of the changes we needed to make were related to decoupling side-effects (e.g. updates in a different database) from the main transaction.
- Decoupling side-effects from a business transaction (not necessarily database transaction) is a good Domain-Driven Design practice which helps with decoupling components.
-
Gitlab::EventStore
could be one of the tools we could use in this case. For example: when aProjectDeletedEvent
is published, other domains (such asCi::
) could react async.
- more...
- more...
- more...
Action items
-
action item 1 -
action item 2
Edited by Fabio Pitino