Skip to content
Snippets Groups Projects
Commit f66a17d1 authored by Lorena Ciutacu's avatar Lorena Ciutacu
Browse files

Merge branch 'cells-move-impacted-features' into 'master'

Cells: Move impacted features docs into subfolder

See merge request !129604



Merged-by: default avatarLorena Ciutacu <lciutacu@gitlab.com>
Reviewed-by: default avatarKamil Trzciński <ayufan@ayufan.eu>
Reviewed-by: default avatarLorena Ciutacu <lciutacu@gitlab.com>
Co-authored-by: default avatarKamil Trzciński <ayufan@ayufan.eu>
parents 2828a16c 0adefdbe
No related branches found
No related tags found
1 merge request!129604Cells: Move impacted features docs into subfolder
Pipeline #972394559 passed
Showing
with 60 additions and 1282 deletions
---
stage: enablement
group: Tenant Scale
description: 'Cells: Admin Area'
redirect_to: 'impacted_features/admin-area.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Admin Area
In our Cells architecture proposal we plan to share all admin related tables in GitLab.
This allows for simpler management of all Cells in one interface and reduces the risk of settings diverging in different Cells.
This introduces challenges with Admin Area pages that allow you to manage data that will be spread across all Cells.
## 1. Definition
There are consequences for Admin Area pages that contain data that span "the whole instance" as the Admin Area pages may be served by any Cell or possibly just one Cell.
There are already many parts of the Admin Area that will have data that span many Cells.
For example lists of all Groups, Projects, Topics, Jobs, Analytics, Applications and more.
There are also administrative monitoring capabilities in the Admin Area that will span many Cells such as the "Background Jobs" and "Background Migrations" pages.
## 2. Data flow
## 3. Proposal
We will need to decide how to handle these exceptions with a few possible
options:
1. Move all these pages out into a dedicated per-Cell admin section. Probably
the URL will need to be routable to a single Cell like `/cells/<cell_id>/admin`,
then we can display these data per Cell. These pages will be distinct from
other Admin Area pages which control settings that are shared across all Cells. We
will also need to consider how this impacts self-managed customers and
whether, or not, this should be visible for single-Cell instances of GitLab.
1. Build some aggregation interfaces for this data so that it can be fetched
from all Cells and presented in a single UI. This may be beneficial to an
administrator that needs to see and filter all data at a glance, especially
when they don't know which Cell the data is on. The downside, however, is
that building this kind of aggregation is very tricky when all Cells are
designed to be totally independent, and it does also enforce stricter
requirements on compatibility between Cells.
The following overview describes at what level each feature contained in the current Admin Area will be managed:
| Feature | Cluster | Cell | Organization |
| --- | --- | --- | --- |
| Abuse reports | | | |
| Analytics | | | |
| Applications | | | |
| Deploy keys | | | |
| Labels | | | |
| Messages | ✓ | | |
| Monitoring | | ✓ | |
| Subscription | | | |
| System hooks | | | |
| Overview | | | |
| Settings - General | ✓ | | |
| Settings - Integrations | ✓ | | |
| Settings - Repository | ✓ | | |
| Settings - CI/CD (1) | ✓ | ✓ | |
| Settings - Reporting | ✓ | | |
| Settings - Metrics | ✓ | | |
| Settings - Service usage data | | ✓ | |
| Settings - Network | ✓ | | |
| Settings - Appearance | ✓ | | |
| Settings - Preferences | ✓ | | |
(1) Depending on the specific setting, some will be managed at the cluster-level, and some at the Cell-level.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/admin-area.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Agent for Kubernetes'
redirect_to: 'impacted_features/agent-for-kubernetes.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Agent for Kubernetes
> TL;DR
## 1. Definition
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/agent-for-kubernetes.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Backups'
redirect_to: 'impacted_features/backups.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Backups
Each Cell will take its own backups, and consequently have its own isolated backup/restore procedure.
## 1. Definition
GitLab backup takes a backup of the PostgreSQL database used by the application, and also Git repository data.
## 2. Data flow
Each Cell has a number of application databases to back up (for example, `main`, and `ci`).
Additionally, there may be cluster-wide metadata tables (for example, `users` table) which is directly accessible via PostgreSQL.
## 3. Proposal
### 3.1. Cluster-wide metadata
It is currently unknown how cluster-wide metadata tables will be accessible.
We may choose to have cluster-wide metadata tables backed up separately, or have each Cell back up its copy of cluster-wide metadata tables.
### 3.2 Consistency
#### 3.2.1 Take backups independently
As each Cell will communicate with each other via API, and there will be no joins to the `users` table, it should be acceptable for each Cell to take a backup independently of each other.
#### 3.2.2 Enforce snapshots
We can require that each Cell take a snapshot for the PostgreSQL databases at around the same time to allow for a consistent enough backup.
## 4. Evaluation
As the number of Cells increases, it will likely not be feasible to take a snapshot at the same time for all Cells.
Hence taking backups independently is the better option.
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/backups.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: CI Runners'
redirect_to: 'impacted_features/ci-runners.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: CI Runners
GitLab executes CI jobs via [GitLab Runner](https://gitlab.com/gitlab-org/gitlab-runner/), very often managed by customers in their infrastructure.
All CI jobs created as part of the CI pipeline are run in the context of a Project.
This poses a challenge how to manage GitLab Runners.
## 1. Definition
There are 3 different types of runners:
- Instance-wide: Runners that are registered globally with specific tags (selection criteria)
- Group runners: Runners that execute jobs from a given top-level Group or Projects in that Group
- Project runners: Runners that execute jobs from one Projects or many Projects: some runners might
have Projects assigned from Projects in different top-level Groups.
This, alongside with the existing data structure where `ci_runners` is a table describing all types of runners, poses a challenge as to how the `ci_runners` should be managed in a Cells environment.
## 2. Data flow
GitLab runners use a set of globally scoped endpoints to:
- Register a new runner via registration token `https://gitlab.com/api/v4/runners`
([subject for removal](../runner_tokens/index.md)) (`registration token`)
- Create a new runner in the context of a user `https://gitlab.com/api/v4/user/runners` (`runner token`)
- Request jobs via an authenticated `https://gitlab.com/api/v4/jobs/request` endpoint (`runner token`)
- Upload job status via `https://gitlab.com/api/v4/jobs/:job_id` (`build token`)
- Upload trace via `https://gitlab.com/api/v4/jobs/:job_id/trace` (`build token`)
- Download and upload artifacts via `https://gitlab.com/api/v4/jobs/:job_id/artifacts` (`build token`)
Currently three types of authentication tokens are used:
- Runner registration token ([subject for removal](../runner_tokens/index.md))
- Runner token representing a registered runner in a system with specific configuration (`tags`, `locked`, etc.)
- Build token representing an ephemeral token giving limited access to updating a specific job, uploading artifacts, downloading dependent artifacts, downloading and uploading container registry images
Each of those endpoints receive an authentication token via header (`JOB-TOKEN` for `/trace`) or body parameter (`token` all other endpoints).
Since the CI pipeline would be created in the context of a specific Cell, it would be required that pick of a build would have to be processed by that particular Cell.
This requires that build picking depending on a solution would have to be either:
- Routed to the correct Cell for the first time
- Be two-phased: Request build from global pool, claim build on a specific Cell using a Cell specific URL
## 3. Proposal
### 3.1. Authentication tokens
Even though the paths for CI runners are not routable, they can be made routable with these two possible solutions:
- The `https://gitlab.com/api/v4/jobs/request` uses a long polling mechanism with
a ticketing mechanism (based on `X-GitLab-Last-Update` header). When the runner first
starts, it sends a request to GitLab to which GitLab responds with either a build to pick
by runner. This value is completely controlled by GitLab. This allows GitLab
to use JWT or any other means to encode a `cell` identifier that could be easily
decodable by Router.
- The majority of communication (in terms of volume) is using `build token`, making it
the easiest target to change since GitLab is the sole owner of the token that the runner later
uses for a specific job. There were prior discussions about not storing the `build token`
but rather using a `JWT` token with defined scopes. Such a token could encode the `cell`
to which the Router could route all requests.
### 3.2. Request body
- The most used endpoints pass the authentication token in the request body. It might be desired
to use HTTP headers as an easier way to access this information by Router without
a need to proxy requests.
### 3.3. Instance-wide are Cell-local
We can pick a design where all runners are always registered and local to a given Cell:
- Each Cell has its own set of instance-wide runners that are updated at its own pace
- The Project runners can only be linked to Projects from the same Organization, creating strong isolation.
- In this model the `ci_runners` table is local to the Cell.
- In this model we would require the above endpoints to be scoped to a Cell in some way, or be made routable. It might be via prefixing them, adding additional Cell parameters, or providing much more robust ways to decode runner tokens and match it to a Cell.
- If a routable token is used, we could move away from cryptographic random stored in database to rather prefer to use JWT tokens.
- The Admin Area showing registered runners would have to be scoped to a Cell.
This model might be desired because it provides strong isolation guarantees.
This model does significantly increase maintenance overhead because each Cell is managed separately.
This model may require adjustments to the runner tags feature so that Projects have a consistent runner experience across Cells.
### 3.4. Instance-wide are cluster-wide
Contrary to the proposal where all runners are Cell-local, we can consider that runners
are global, or just instance-wide runners are global.
However, this requires significant overhaul of the system and we would have to change the following aspects:
- The `ci_runners` table would likely have to be decomposed into `ci_instance_runners`, ...
- All interfaces would have to be adopted to use the correct table.
- Build queuing would have to be reworked to be two-phased where each Cell would know of all pending and running builds, but the actual claim of a build would happen against a Cell containing data.
- It is likely that `ci_pending_builds` and `ci_running_builds` would have to be made `cluster-wide` tables, increasing the likelihood of creating hotspots in a system related to CI queueing.
This model is complex to implement from an engineering perspective.
Some data are shared between Cells.
It creates hotspots/scalability issues in a system that might impact the experience of Organizations on other Cells, for instance during abuse.
### 3.5. GitLab CI Daemon
Another potential solution to explore is to have a dedicated service responsible for builds queueing, owning its database and working in a model of either sharded or Cell-ed service.
There were prior discussions about [CI/CD Daemon](https://gitlab.com/gitlab-org/gitlab/-/issues/19435).
If the service is sharded:
- Depending on the model, if runners are cluster-wide or Cell-local, this service would have to fetch data from all Cells.
- If the sharded service is used we could adapt a model of sharing a database containing `ci_pending_builds/ci_running_builds` with the service.
- If the sharded service is used we could consider a push model where each Cell pushes to CI/CD Daemon builds that should be picked by runner.
- The sharded service would be aware which Cell is responsible for processing the given build and could route processing requests to the designated Cell.
If the service is Cell-ed:
- All expectations of routable endpoints are still valid.
In general usage of CI Daemon does not help significantly with the stated problem.
However, this offers a few upsides related to more efficient processing and decoupling model: push model and it opens a way to offer stateful communication with GitLab runners (ex. gRPC or Websockets).
## 4. Evaluation
Considering all options it appears that the most promising solution is to:
- Use [Instance-wide are Cell-local](#33-instance-wide-are-cell-local)
- Refine endpoints to have routable identities (either via specific paths, or better tokens)
Another potential upside is to get rid of `ci_builds.token` and rather use a `JWT token` that can much better and easier encode a wider set of scopes allowed by CI runner.
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/ci-runners.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Container Registry'
redirect_to: 'impacted_features/container-registry.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Container Registry
GitLab [Container Registry](../../../user/packages/container_registry/index.md) is a feature allowing to store Docker container images in GitLab.
## 1. Definition
GitLab Container Registry is a complex service requiring usage of PostgreSQL, Redis and Object Storage dependencies.
Right now there's undergoing work to introduce [Container Registry Metadata](../container_registry_metadata_database/index.md) to optimize data storage and image retention policies of Container Registry.
GitLab Container Registry is serving as a container for stored data, but on its own does not authenticate `docker login`.
The `docker login` is executed with user credentials (can be `personal access token`) or CI build credentials (ephemeral `ci_builds.token`).
Container Registry uses data deduplication.
It means that the same blob (image layer) that is shared between many Projects is stored only once.
Each layer is hashed by `sha256`.
The `docker login` does request a JWT time-limited authentication token that is signed by GitLab, but validated by Container Registry service.
The JWT token does store all authorized scopes (`container repository images`) and operation types (`push` or `pull`).
A single JWT authentication token can have many authorized scopes.
This allows Container Registry and client to mount existing blobs from other scopes.
GitLab responds only with authorized scopes.
Then it is up to GitLab Container Registry to validate if the given operation can be performed.
The GitLab.com pages are always scoped to a Project.
Each Project can have many container registry images attached.
Currently, on GitLab.com the actual registry service is served via `https://registry.gitlab.com`.
The main identifiable problems are:
- The authentication request (`https://gitlab.com/jwt/auth`) that is processed by GitLab.com.
- The `https://registry.gitlab.com` that is run by an external service and uses its own data store.
- Data deduplication. The Cells architecture with registry run in a Cell would reduce efficiency of data storage.
## 2. Data flow
### 2.1. Authorization request that is send by `docker login`
```shell
curl \
--user "username:password" \
"https://gitlab/jwt/auth?client_id=docker&offline_token=true&service=container_registry&scope=repository:gitlab-org/gitlab-build-images:push,pull"
```
Result is encoded and signed JWT token. Second base64 encoded string (split by `.`) contains JSON with authorized scopes.
```json
{"auth_type":"none","access":[{"type":"repository","name":"gitlab-org/gitlab-build-images","actions":["pull"]}],"jti":"61ca2459-091c-4496-a3cf-01bac51d4dc8","aud":"container_registry","iss":"omnibus-gitlab-issuer","iat":1669309469,"nbf":166}
```
### 2.2. Docker client fetching tags
```shell
curl \
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
-H "Authorization: Bearer token" \
https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/tags/list
curl \
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
-H "Authorization: Bearer token" \
https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/manifests/danger-ruby-2.6.6
```
### 2.3. Docker client fetching blobs and manifests
```shell
curl \
-H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
-H "Authorization: Bearer token" \
https://registry.gitlab.com/v2/gitlab-org/gitlab-build-images/blobs/sha256:a3f2e1afa377d20897e08a85cae089393daa0ec019feab3851d592248674b416
```
## 3. Proposal
### 3.1. Shard Container Registry separately to Cells architecture
Due to its extensive and in general highly scalable horizontal architecture it should be evaluated if the GitLab Container Registry should be run not in Cell, but in a Cluster and be scaled independently.
This might be easier, but would definitely not offer the same amount of data isolation.
### 3.2. Run Container Registry within a Cell
It appears that except `/jwt/auth` which would likely have to be processed by Router (to decode `scope`) the Container Registry could be run as a local service of a Cell.
The actual data at least in case of GitLab.com is not forwarded via registry, but rather served directly from Object Storage / CDN.
Its design encodes container repository image in a URL that is easily routable.
It appears that we could re-use the same stateless Router service in front of Container Registry to serve manifests and blobs redirect.
The only downside is increased complexity of managing standalone registry for each Cell, but this might be desired approach.
## 4. Evaluation
There do not seem to be any theoretical problems with running GitLab Container Registry in a Cell.
It seems that the service can be easily made routable to work well.
The practical complexities are around managing a complex service from an infrastructure side.
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/container-registry.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Contributions: Forks'
redirect_to: 'impacted_features/contributions-forks.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Contributions: Forks
The [Forking workflow](../../../user/project/repository/forking_workflow.md) allows users to copy existing Project sources into their own namespace of choice (Personal or Group).
## 1. Definition
The [Forking workflow](../../../user/project/repository/forking_workflow.md) is a common workflow with various usage patterns:
- It allows users to contribute back to upstream Project.
- It persists repositories into their Personal Namespace.
- Users can copy to make changes and release as modified Project.
Forks allow users not having write access to a parent Project to make changes.
The forking workflow is especially important for the open source community to contribute back to public Projects.
However, it is equally important in some companies that prefer a strong split of responsibilities and tighter access control.
The access to a Project is restricted to a designated list of developers.
Forks enable:
- Tighter control of who can modify the upstream Project.
- Split of responsibilities: Parent Project might use CI configuration connecting to production systems.
- To run CI pipelines in the context of a fork in a much more restrictive environment.
- To consider all forks to be unvetted which reduces risks of leaking secrets, or any other information tied to the Project.
The forking model is problematic in a Cells architecture for the following reasons:
- Forks are clones of existing repositories. Forks could be created across different Organizations, Cells and Gitaly shards.
- Users can create merge requests and contribute back to an upstream Project. This upstream Project might in a different Organization and Cell.
- The merge request CI pipeline is executed in the context of the source Project, but presented in the context of the target Project.
## 2. Data flow
## 3. Proposals
### 3.1. Intra-Cluster forks
This proposal implements forks as intra-Cluster forks where communication is done via API between all trusted Cells of a cluster:
- Forks are created always in the context of a user's choice of Group.
- Forks are isolated from the Organization.
- Organization or Group owner could disable forking across Organizations, or forking in general.
- A merge request is created in the context of the target Project, referencing the external Project on another Cell.
- To target Project the merge reference is transferred that is used for presenting information in context of the target Project.
- CI pipeline is fetched in the context of the source Project as it is today, the result is fetched into the merge request of the target Project.
- The Cell holding the target Project internally uses GraphQL to fetch the status of the source Project and includes in context of the information for merge request.
Upsides:
- All existing forks continue to work as they are, as they are treated as intra-Cluster forks.
Downsides:
- The purpose of Organizations is to provide strong isolation between Organizations. Allowing to fork across does break security boundaries.
- However, this is no different to the ability of users today to clone a repository to a local computer and push it to any repository of choice.
- Access control of source Project can be lower than those of target Project. Today, the system requires that in order to contribute back, the access level needs to be the same for fork and upstream.
### 3.2. Forks are created in a Personal Namespace of the current Organization
Instead of creating Projects across Organizations, forks are created in a user's Personal Namespace tied to the Organization. Example:
- Each user that is part of an Organization receives their Personal Namespace. For example for `GitLab Inc.` it could be `gitlab.com/organization/gitlab-inc/@ayufan`.
- The user has to fork into their own Personal Namespace of the Organization.
- The user has as many Personal Namespaces as Organizations they belongs to.
- The Personal Namespace behaves similar to the currently offered Personal Namespace.
- The user can manage and create Projects within a Personal Namespace.
- The Organization can prevent or disable usage of Personal Namespaces, disallowing forks.
- All current forks are migrated into the Personal Namespace of user in an Organization.
- All forks are part of the Organization.
- Forks are not federated features.
- The Personal Namespace and forked Project do not share configuration with the parent Project.
### 3.3. Forks are created as internal Projects under current Projects
Instead of creating Projects across Organizations, forks are attachments to existing Projects.
Each user forking a Project receives their unique Project. Example:
- For Project: `gitlab.com/gitlab-org/gitlab`, forks would be created in `gitlab.com/gitlab-org/gitlab/@kamil-gitlab`.
- Forks are created in the context of the current Organization, they do not cross Organization boundaries and are managed by the Organization.
- Tied to the user (or any other user-provided name of the fork).
- Forks are not federated features.
Downsides:
- Does not answer how to handle and migrate all existing forks.
- Might share current Group/Project settings, which could be breaking some security boundaries.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/contributions-forks.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Data migration'
redirect_to: 'impacted_features/data-migration.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Data migration
It is essential for a Cells architecture to provide a way to migrate data out of big Cells into smaller ones.
This document describes various approaches to provide this type of split.
We also need to handle cases where data is already violating the expected isolation constraints of Cells, for example references cannot span multiple Organizations.
We know that existing features like linked issues allowed users to link issues across any Projects regardless of their hierarchy.
There are many similar features.
All of this data will need to be migrated in some way before it can be split across different Cells.
This may mean some data needs to be deleted, or the feature needs to be changed and modelled slightly differently before we can properly split or migrate Organizations between Cells.
Having schema deviations across different Cells, which is a necessary consequence of different databases, will also impact our ability to migrate data between Cells.
Different schemas impact our ability to reliably replicate data across Cells and especially impact our ability to validate that the data is correctly replicated.
It might force us to only be able to move data between Cells when the schemas are all in sync (slowing down deployments and the rebalancing process) or possibly only migrate from newer to older schemas which would be complex.
## 1. Definition
## 2. Data flow
## 3. Proposal
### 3.1. Split large Cells
A single Cell can only be divided into many Cells.
This is based on the principle that it is easier to create an exact clone of an existing Cell in many replicas out of which some will be made authoritative once migrated.
Keeping those replicas up-to-date with Cell 0 is also much easier due to pre-existing replication solutions that can replicate the whole systems: Geo, PostgreSQL physical replication, etc.
1. All data of an Organization needs to not be divided across many Cells.
1. Split should be doable online.
1. New Cells cannot contain pre-existing data.
1. N Cells contain exact replica of Cell 0.
1. The data of Cell 0 is live replicated to as many Cells it needs to be split.
1. Once consensus is achieved between Cell 0 and N-Cells, the Organizations to be migrated away are marked as read-only cluster-wide.
1. The `routes` is updated on for all Organizations to be split to indicate an authoritative Cell holding the most recent data, like `gitlab-org` on `cell-100`.
1. The data for `gitlab-org` on Cell 0, and on other non-authoritative N-Cells are dormant and will be removed in the future.
1. All accesses to `gitlab-org` on a given Cell are validated about `cell_id` of `routes` to ensure that given Cell is authoritative to handle the data.
#### More challenges of this proposal
1. There is no streaming replication capability for Elasticsearch, but you could
snapshot the whole Elasticsearch index and recreate, but this takes hours.
It could be handled by pausing Elasticsearch indexing on the initial Cell during
the migration as indexing downtime is not a big issue, but this still needs
to be coordinated with the migration process.
1. Syncing Redis, Gitaly, CI Postgres, Main Postgres, registry Postgres, other
new data stores snapshots in an online system would likely lead to gaps
without a long downtime. You need to choose a sync point and at the sync
point you need to stop writes to perform the migration. The more data stores
there are to migrate at the same time the longer the write downtime for the
failover. We would also need to find a reliable place in the application to
actually block updates to all these systems with a high degree of
confidence. In the past we've only been confident by shutting down all Rails
services because any Rails process could write directly to any of these at
any time due to async workloads or other surprising code paths.
1. How to efficiently delete all the orphaned data. Locating all `ci_builds`
associated with half the Organizations would be very expensive if we have to
do joins. We haven't yet determined if we'd want to store an `organization_id`
column on every table, but this is the kind of thing it would be helpful for.
### 3.2. Migrate Organization from an existing Cell
This is different to split, as we intend to perform logical and selective replication of data belonging to a single Organization.
Today this type of selective replication is only implemented by Gitaly where we can migrate Git repository from a single Gitaly node to another with minimal downtime.
In this model we would require identifying all resources belonging to a given Organization: database rows, object storage files, Git repositories, etc. and selectively copy them over to another (likely) existing Cell importing data into it.
Ideally ensuring that we can perform logical replication live of all changed data, but change similarly to split which Cell is authoritative for this Organization.
1. It is hard to identify all resources belonging to an Organization.
1. It requires either downtime for the Organization or a robust system to identify live changes made.
1. It likely will require a full database structure analysis (more robust than Project import/export) to perform selective PostgreSQL logical replication.
#### More challenges of this proposal
1. Logical replication is still not performant enough to keep up with our
scale. Even if we could use logical replication we still don't have an
efficient way to filter data related to a single Organization without
joining all the way to the `organizations` table which will slow down
logical replication dramatically.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/data-migration.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Database Sequences'
redirect_to: 'impacted_features/database-sequences.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Database Sequences
GitLab today ensures that every database row create has a unique ID, allowing to access a merge request, CI Job or Project by a known global ID.
Cells will use many distinct and not connected databases, each of them having a separate ID for most entities.
At a minimum, any ID referenced between a Cell and the shared schema will need to be unique across the cluster to avoid ambiguous references.
Further to required global IDs, it might also be desirable to retain globally unique IDs for all database rows to allow migrating resources between Cells in the future.
## 1. Definition
## 2. Data flow
## 3. Proposal
These are some preliminary ideas how we can retain unique IDs across the system.
### 3.1. UUID
Instead of using incremental sequences, use UUID (128 bit) that is stored in the database.
- This might break existing IDs and requires adding a UUID column for all existing tables.
- This makes all indexes larger as it requires storing 128 bit instead of 32/64 bit in index.
### 3.2. Use Cell index encoded in ID
Because a significant number of tables already use 64 bit ID numbers we could use MSB to encode the Cell ID:
- This might limit the amount of Cells that can be enabled in a system, as we might decide to only allocate 1024 possible Cell numbers.
- This would make it possible to migrate IDs between Cells, because even if an entity from Cell 1 is migrated to Cell 100 this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode the Cell number and we would need a lookup table.
- This requires updating all IDs to 32 bits.
### 3.3. Allocate sequence ranges from central place
Each Cell might receive its own range of sequences as they are consumed from a centrally managed place.
Once a Cell consumes all IDs assigned for a given table it would be replenished and a next range would be allocated.
Ranges would be tracked to provide a faster lookup table if a random access pattern is required.
- This might make IDs migratable between Cells, because even if an entity from Cell 1 is migrated to Cell 100 this ID would still be unique.
- If resources are migrated the ID itself will not be enough to decode the Cell number and we would need a much more robust lookup table as we could be breaking previously assigned sequence ranges.
- This does not require updating all IDs to 64 bits.
- This adds some performance penalty to all `INSERT` statements in Postgres or at least from Rails as we need to check for the sequence number and potentially wait for our range to be refreshed from the ID server.
- The available range will need to be stored and incremented in a centralized place so that concurrent transactions cannot possibly get the same value.
### 3.4. Define only some tables to require unique IDs
Maybe it is acceptable only for some tables to have a globally unique IDs. It could be Projects, Groups and other top-level entities.
All other tables like `merge_requests` would only offer a Cell-local ID, but when referenced outside it would rather use an IID (an ID that is monotonic in context of a given resource, like a Project).
- This makes the ID 10000 for `merge_requests` be present on all Cells, which might be sometimes confusing regarding the uniqueness of the resource.
- This might make random access by ID (if ever needed) impossible without using a composite key, like: `project_id+merge_request_id`.
- This would require us to implement a transformation/generation of new ID if we need to migrate records to another Cell. This can lead to very difficult migration processes when these IDs are also used as foreign keys for other records being migrated.
- If IDs need to change when moving between Cells this means that any links to records by ID would no longer work even if those links included the `project_id`.
- If we plan to allow these IDs to not be unique and change the unique constraint to be based on a composite key then we'd need to update all foreign key references to be based on the composite key.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/database-sequences.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Explore'
redirect_to: 'impacted_features/explore.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Explore
Explore may not play a critical role in GitLab as it functions today, but GitLab today is not isolated. It is the isolation that makes Explore or some viable replacement necessary.
The existing Group and Project Explore will initially be scoped to an Organization. However, there is a need for a global Explore that spans across Organizations to support the discoverability of public Groups and Projects, in particular in the context of discovering open source Projects. See user feedback [here](https://gitlab.com/gitlab-org/gitlab/-/issues/21582#note_1458298192) and [here](https://gitlab.com/gitlab-org/gitlab/-/issues/418228#note_1470045468).
## 1. Definition
The Explore functionality helps users in discovering Groups and Projects. Unauthenticated Users are only able to explore public Groups and Projects, authenticated Users can see all the Groups and Projects that they have access to, including private and internal Groups and Projects.
## 2. Data flow
## 3. Proposal
The Explore feature problem falls under the broader umbrella of solving inter-Cell communication. [This topic warrants deeper research](index.md#can-different-cells-communicate-with-each-other).
Below are possible directions for further investigation.
### 3.1. Read only table mirror
- Create a `shared_projects` table in the shared cluster-wide database.
- The model for this table is read-only. No inserts/updates/deletes are allowed.
- The table is filled with data (or a subset of data) from the Projects Cell-local table.
- The write model Project (which is Cell-local) writes to the local database. We will primarily use this model for anything Cell-local.
- This data is synchronized with `shared_projects` via a background job any time something changes.
- The data in `shared_projects` is stored normalized, so that all the information necessary to display the Project Explore is there.
- The Project Explore (as of today) is part of an instance-wide functionality, since it's not namespaced to any organizations/groups.
- This section will read data using the read model for `shared_projects`.
- Once the user clicks on a Project, they are redirected to the Cell containing the Organization.
Downsides:
- Need to have an explicit pattern to access instance-wide data. This however may be useful for admin functionalities too.
- The Project Explore may not be as rich in features as it is today (various filtering options, role you have on that Project, etc.).
- Extra complexity in managing CQRS.
### 3.2 Explore scoped to an Organization
The Project Explore and Group Explore are scoped to an Organization.
Downsides:
- No global discoverability of Groups and Projects.
## 4. Evaluation
The existing Group and Project Explore will initially be scoped to an Organization. Considering the [current usage of the Explore feature](https://gitlab.com/gitlab-data/product-analytics/-/issues/1302#note_1491215521), we deem this acceptable. Since all existing Users, Groups and Projects will initially be part of the default Organization, Groups and Projects will remain explorable and accessible as they are today. Only once existing Groups and Projects are moved out of the default Organization into different Organizations will this become a noticeable problem. Solutions to mitigate this are discussed in [issue #418228](https://gitlab.com/gitlab-org/gitlab/-/issues/418228). Ultimately, Explore could be replaced with a better search experience altogether.
## 4.1. Pros
- Initially the lack of discoverability will not be a problem.
- Only around [1.5% of all exisiting Users are using the Explore functionality on a monthly basis](https://gitlab.com/gitlab-data/product-analytics/-/issues/1302#note_1491215521).
## 4.2. Cons
- The GitLab owned top-level Groups would be some of the first to be moved into their own Organization and thus be detached from the explorability of the default Organization.
This document was moved to [another location](impacted_features/explore.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Git Access'
redirect_to: 'impacted_features/git-access.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Git Access
This document describes impact of Cells architecture on all Git access (over HTTPS and SSH) patterns providing explanation of how potentially those features should be changed to work well with Cells.
## 1. Definition
Git access is done throughout the application.
It can be an operation performed by the system (read Git repository) or by a user (create a new file via Web IDE, `git clone` or `git push` via command line).
The Cells architecture defines that all Git repositories will be local to the Cell, so no repository could be shared with another Cell.
The Cells architecture will require that any Git operation can only be handled by a Cell holding the data.
It means that any operation either via Web interface, API, or GraphQL needs to be routed to the correct Cell.
It means that any `git clone` or `git push` operation can only be performed in the context of a Cell.
## 2. Data flow
The are various operations performed today by GitLab on a Git repository.
This describes the data flow how they behave today to better represent the impact.
It appears that Git access does require changes only to a few endpoints that are scoped to a Project.
There appear to be different types of repositories:
- Project: assigned to Group
- Wiki: additional repository assigned to Project
- Design: similar to Wiki, additional repository assigned to Project
- Snippet: creates a virtual Project to hold repository, likely tied to the User
### 2.1. Git clone over HTTPS
Execution of: `git clone` over HTTPS
```mermaid
sequenceDiagram
User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-upload-pack
Rails ->> Workhorse: 200 OK
Workhorse ->> Gitaly: RPC InfoRefsUploadPack
Gitaly ->> User: Response
User ->> Workhorse: POST /gitlab-org/gitlab.git/git-upload-pack
Workhorse ->> Gitaly: RPC PostUploadPackWithSidechannel
Gitaly ->> User: Response
```
### 2.2. Git clone over SSH
Execution of: `git clone` over SSH
```mermaid
sequenceDiagram
User ->> Git SSHD: ssh git@gitlab.com
Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
Git SSHD ->> User: Accept SSH
User ->> Git SSHD: git clone over SSH
Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-upload-pack
Rails ->> Git SSHD: 200 OK
Git SSHD ->> Gitaly: RPC SSHUploadPackWithSidechannel
Gitaly ->> User: Response
```
### 2.3. Git push over HTTPS
Execution of: `git push` over HTTPS
```mermaid
sequenceDiagram
User ->> Workhorse: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
Workhorse ->> Rails: GET /gitlab-org/gitlab.git/info/refs?service=git-receive-pack
Rails ->> Workhorse: 200 OK
Workhorse ->> Gitaly: RPC PostReceivePack
Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111&service=git-receive-pack
Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
Gitaly ->> User: Response
```
### 2.4. Git push over SSHD
Execution of: `git clone` over SSH
```mermaid
sequenceDiagram
User ->> Git SSHD: ssh git@gitlab.com
Git SSHD ->> Rails: GET /api/v4/internal/authorized_keys
Rails ->> Git SSHD: 200 OK (list of accepted SSH keys)
Git SSHD ->> User: Accept SSH
User ->> Git SSHD: git clone over SSH
Git SSHD ->> Rails: POST /api/v4/internal/allowed?project=/gitlab-org/gitlab.git&service=git-receive-pack
Rails ->> Git SSHD: 200 OK
Git SSHD ->> Gitaly: RPC ReceivePack
Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
Gitaly ->> User: Response
```
### 2.5. Create commit via Web
Execution of `Add CHANGELOG` to repository:
```mermaid
sequenceDiagram
Web ->> Puma: POST /gitlab-org/gitlab/-/create/main
Puma ->> Gitaly: RPC TreeEntry
Gitaly ->> Rails: POST /api/v4/internal/allowed?gl_repository=project-111
Gitaly ->> Rails: POST /api/v4/internal/pre_receive?gl_repository=project-111
Gitaly ->> Rails: POST /api/v4/internal/post_receive?gl_repository=project-111
Gitaly ->> Puma: Response
Puma ->> Web: See CHANGELOG
```
## 3. Proposal
The Cells stateless router proposal requires that any ambiguous path (that is not routable) will be made routable.
It means that at least the following paths will have to be updated to introduce a routable entity (Project, Group, or Organization).
Change:
- `/api/v4/internal/allowed` => `/api/v4/internal/projects/<gl_repository>/allowed`
- `/api/v4/internal/pre_receive` => `/api/v4/internal/projects/<gl_repository>/pre_receive`
- `/api/v4/internal/post_receive` => `/api/v4/internal/projects/<gl_repository>/post_receive`
- `/api/v4/internal/lfs_authenticate` => `/api/v4/internal/projects/<gl_repository>/lfs_authenticate`
Where:
- `gl_repository` can be `project-1111` (`Gitlab::GlRepository`)
- `gl_repository` in some cases might be a full path to repository as executed by GitLab Shell (`/gitlab-org/gitlab.git`)
## 4. Evaluation
Supporting Git repositories if a Cell can access only its own repositories does not appear to be complex.
The one major complication is supporting snippets, but this likely falls in the same category as for the approach to support a user's Personal Namespace.
## 4.1. Pros
1. The API used for supporting HTTPS/SSH and Hooks are well defined and can easily be made routable.
## 4.2. Cons
1. The sharing of repositories objects is limited to the given Cell and Gitaly node.
1. Cross-Cells forks are likely impossible to be supported (discover: How this works today across different Gitaly node).
This document was moved to [another location](impacted_features/git-access.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: GitLab Pages'
redirect_to: 'impacted_features/gitlab-pages.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: GitLab Pages
> TL;DR
## 1. Definition
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/gitlab-pages.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Global search'
redirect_to: 'impacted_features/global-search.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Global search
When we introduce multiple Cells we intend to isolate all services related to those Cells.
This will include Elasticsearch which means our current global search functionality will not work.
It may be possible to implement aggregated search across all Cells, but it is unlikely to be performant to do fan-out searches across all Cells especially once you start to do pagination which requires setting the correct offset and page number for each search.
## 1. Definition
## 2. Data flow
## 3. Proposal
Likely the first versions of Cells will not support global searches.
Later, we may consider if building global searches to support popular use cases is worthwhile.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/global-search.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: GraphQL'
redirect_to: 'impacted_features/graphql.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: GraphQL
GitLab extensively uses GraphQL to perform efficient data query operations.
GraphQL due to it's nature is not directly routable.
The way GitLab uses it calls the `/api/graphql` endpoint, and only the query or mutation of the body request might define where the data can be accessed.
## 1. Definition
## 2. Data flow
## 3. Proposal
There are at least two main ways to implement GraphQL in a Cells architecture.
### 3.1. GraphQL routable by endpoint
Change `/api/graphql` to `/api/organization/<organization>/graphql`.
- This breaks all existing usages of `/api/graphql` endpoint because the API URI is changed.
### 3.2. GraphQL routable by body
As part of router parse GraphQL body to find a routable entity, like `project`.
- This still makes the GraphQL query be executed only in context of a given Cell and not allowing the data to be merged.
```json
# Good example
{
project(fullPath:"gitlab-org/gitlab") {
id
description
}
}
# Bad example, since Merge Request is not routable
{
mergeRequest(id: 1111) {
iid
description
}
}
```
### 3.3. Merging GraphQL Proxy
Implement as part of router GraphQL Proxy which can parse body and merge results from many Cells.
- This might make pagination hard to achieve, or we might assume that we execute many queries of which results are merged across all Cells.
```json
{
project(fullPath:"gitlab-org/gitlab"){
id, description
}
group(fullPath:"gitlab-com") {
id, description
}
}
```
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/graphql.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Organizations'
redirect_to: 'impacted_features/organizations.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Organizations
One of the major designs of a Cells architecture is strong isolation between Groups.
Organizations as described by the [Organization blueprint](../organization/index.md) provides a way to have plausible UX for joining together many Groups that are isolated from the rest of the system.
## 1. Definition
Cells do require that all Groups and Projects of a single Organization can only be stored on a single Cell because a Cell can only access data that it holds locally and has very limited capabilities to read information from other Cells.
Cells with Organizations do require strong isolation between Organizations.
It will have significant implications on various user-facing features, like Todos, dropdowns allowing to select Projects, references to other issues or Projects, or any other social functions present at GitLab.
Today those functions were able to reference anything in the whole system.
With the introduction of Organizations this will be forbidden.
This problem definition aims to answer effort and implications required to add strong isolation between Organizations to the system, including features affected and their data processing flow.
The purpose is to ensure that our solution when implemented consistently avoids data leakage between Organizations residing on a single Cell.
## 2. Proposal
See the [Organization blueprint](../organization/index.md).
This document was moved to [another location](impacted_features/organizations.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Personal Access Tokens'
redirect_to: 'impacted_features/personal-access-tokens.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Personal Access Tokens
## 1. Definition
Personal Access Tokens associated with a User are a way for Users to interact with the API of GitLab to perform operations.
Personal Access Tokens today are scoped to the User, and can access all Groups that a User has access to.
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/personal-access-tokens.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Personal Namespaces'
redirect_to: 'impacted_features/personal-namespaces.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Personal Namespaces
> TL;DR
## 1. Definition
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/personal-namespaces.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Router Endpoints Classification'
redirect_to: 'impacted_features/router-endpoints-classification.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Router Endpoints Classification
Classification of all endpoints is essential to properly route requests hitting the load balancer of a GitLab installation to a Cell that can serve it.
Each Cell should be able to decode each request and classify which Cell it belongs to.
GitLab currently implements hundreds of endpoints.
This document tries to describe various techniques that can be implemented to allow the Rails to provide this information efficiently.
## 1. Definition
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/router-endpoints-classification.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Schema changes'
redirect_to: 'impacted_features/schema-changes.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Schema changes
When we introduce multiple Cells that own their own databases this will complicate the process of making schema changes to Postgres and Elasticsearch.
Today we already need to be careful to make changes comply with our zero downtime deployments.
For example, [when removing a column we need to make changes over 3 separate deployments](../../../development/database/avoiding_downtime_in_migrations.md#dropping-columns).
We have tooling like `post_migrate` that helps with these kinds of changes to reduce the number of merge requests needed, but these will be complicated when we are dealing with deploying multiple Rails applications that will be at different versions at any one time.
This problem will be particularly tricky to solve for shared databases like our plan to share the `users` related tables among all Cells.
A key benefit of Cells may be that it allows us to run different customers on different versions of GitLab.
We may choose to update our own Cell before all our customers giving us even more flexibility than our current canary architecture.
But doing this means that schema changes need to have even more versions of backward compatibility support which could slow down development as we need extra steps to make schema changes.
## 1. Definition
## 2. Data flow
## 3. Proposal
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/schema-changes.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Secrets'
redirect_to: 'impacted_features/secrets.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Secrets
Where possible, each Cell should have its own distinct set of secrets.
However, there will be some secrets that will be required to be the same for all Cells in the cluster.
## 1. Definition
GitLab has a lot of [secrets](https://docs.gitlab.com/charts/installation/secrets.html) that need to be configured.
Some secrets are for inter-component communication, for example, `GitLab Shell secret`, and used only within a Cell.
Some secrets are used for features, for example, `ci_jwt_signing_key`.
## 2. Data flow
## 3. Proposal
1. Secrets used for features will need to be consistent across all Cells, so that the UX is consistent.
1. This is especially true for the `db_key_base` secret which is used for
encrypting data at rest in the database - so that Projects that are
transferred to another Cell will continue to work. We do not want to have
to re-encrypt such rows when we move Projects/Groups between Cells.
1. Secrets which are used for intra-Cell communication only should be uniquely generated
per Cell.
## 4. Evaluation
## 4.1. Pros
## 4.2. Cons
This document was moved to [another location](impacted_features/secrets.md).
---
stage: enablement
group: Tenant Scale
description: 'Cells: Snippets'
redirect_to: 'impacted_features/snippets.md'
remove_date: '2023-11-17'
---
<!-- vale gitlab.FutureTense = NO -->
This document is a work-in-progress and represents a very early state of the
Cells design. Significant aspects are not documented, though we expect to add
them in the future. This is one possible architecture for Cells, and we intend to
contrast this with alternatives before deciding which approach to implement.
This documentation will be kept even if we decide not to implement this so that
we can document the reasons for not choosing this approach.
# Cells: Snippets
Snippets will be scoped to an Organization. Initially it will not be possible to aggregate snippet collections across Organizations. See also [issue #416954](https://gitlab.com/gitlab-org/gitlab/-/issues/416954).
## 1. Definition
Two different types of snippets exist:
- [Project snippets](../../../api/project_snippets.md). These snippets have URLs
like `/<group>/<project>/-/snippets/123`
- [Personal snippets](../../../user/snippets.md). These snippets have URLs like
`/-/snippets/123`
Snippets are backed by a Git repository.
## 2. Data flow
## 3. Proposal
### 3.1. Scoped to an organization
Both project and personal snippets will be scoped to an Organization.
- Project snippets URLs will remain unchanged, as the URLs are routable.
- Personal snippets URLs will need to change to be `/-/organizations/<organization>/snippets/123`,
so that the URL is routeable
Creation of snippets will also be scoped to a User's current Organization. Because of that, we recommend renaming `personal snippets` to `organization snippets` once the Organization is rolled out. A User can create many independent snippet collections across multiple Organizations.
## 4. Evaluation
Snippets are scoped to an Organization because Gitaly is confined to a Cell.
## 4.1. Pros
- No need to have clusterwide Gitaly.
## 4.2. Cons
- We will break [snippet discovery](/ee/user/snippets.md#discover-snippets).
- Snippet access may become subordinate to the visibility of the Organization.
This document was moved to [another location](impacted_features/snippets.md).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment