Rework GitLab architecture diagrams

Problem to solve

Improve our architecture workflow and documentation at GitLab.

Further details

https://docs.gitlab.com/ee/development/architecture.html depicts the main components of GitLab, with some others like GitLab-Runner but leaves on the side a lot of other components, especially the most recent ones. This disconnection between "legacy" and new components always comes from the fact that the representations are already bloated with details. As a result, this page isn't always up to date, and contains sometimes too much details, sometimes not enough.
This problem is well described in the Software Systems Architecture book, where the authors explain the need to have different views and viewpoints for the different personas consuming the architecture documentation. It is clear that https://docs.gitlab.com/ee/development/architecture.html can be used by different personas like Developers, Solution Architects, SREs, who will want different level of details.

This lack of consistency also brings some confusion with our reference architectures, because only a part of GitLab is covered here. It's hard to figure out for customers what components would be missing dependending on the features they want to use.

Our GitLab Architecture Workflow describes well the process to write design documents, but we don't prescribe anything when it comes to represent the details of our architectures. We have different formats (png, svg, txt, ...), tools (draw.io, google Draw, excalidraw...), and styles used to describes them. To add to this, we rely heavily on design docs (previously called "Blueprints" at GitLab), which also describe the architecture of some systems, for example here: https://docs.gitlab.com/ee/architecture/blueprints/cloud_native_gitlab_pages/. It could be hard for the reader to understand, because the information is split between this page and https://docs.gitlab.com/ee/user/project/pages/.

If we were more consistent with our diagrams and architecture docs, we could ease their maintenance but also their readability.

Proposal

To solve this problem, we started working on a POC (see links above) to experiment with the idea of splitting GitLab into smaller, more maintainable parts: GitLab itself is decomposed into small chunks called "Software Systems". These Software Systems define boundaries across our software codebase, and are generally right to dedicated features. Apart from the gitlab-core one, which is the minimum set of services required to have a GitLab instance up-and-running, any other system is considered optional. Even GitLab-Runner is one of these optional systems, because not required stricto sensu. Other examples include the container registry, the secure scanners, gitlab pages, and many more.

By using diagrams as code (see https://c4model.com/), we can define a SSOT for all components, and generate different views based on our needs. This approach also have other advantages like being able to export tables of network connectivity which is often requested by our customers.

Who can address the issue

We (Security) would like to have a discussion with the Technical Writing team to gather some early feedback and understand the appetite to continue this POC. It begins to be stable enough for a first iteration, at least to update https://docs.gitlab.com/ee/development/architecture.html.

Rework GitLab architecture diagrams

Problem to solve

Further details

Proposal

Who can address the issue

Other links/references