Skip to content
Snippets Groups Projects
Verified Commit a6a87bf7 authored by gitlab-housekeeper's avatar gitlab-housekeeper Committed by GitLab
Browse files

Draft: Add sharding key tracking issues for geo_replication

Add sharding key tracking issues for feature category `geo_replication`.

## Background

These tables were unable to be classified automatically, and will require manual input. Eventually all tables will
need to be correctly classified, but we understand that this will be complex for some tables and completing these
will take time. Instead, our goal for this task is to ensure all remaining tables are tracked in an issue, and to
classify any straightforward cases that our automation may have missed (options 1 and 2 below).

We have assigned a random backend engineer from ~"group::geo" as the initial DRI for this task, as well as an
engineering manager for visibility. Please note that we are not requesting a large time commitment, creating one
issue and linking it for all tables is perfectly acceptable.

When you are finished, please assign to the ~database reviewer/maintainer suggested by Danger.

If you have any questions or concerns, reach out to `#g_tenant-scale`.

## Task

For each table, please select one of the following options:

### Option 1: Add a sharding key tracking issue

This option is best suited to tables whose sharding behaviour is unknown, or will require additional work before
a sharding key can be defined.

Replace the `TODO` in the dictionary file with a link to an issue in the gitlab-org/gitlab project.

```diff
- sharding_key_issue_url: TODO
+ sharding_key_issue_url: #1234
```

You can create a new issue or link an existing one, and multiple entries can refer to the same issue. These issues will
be used to track the work remaining on the [progress dashboard](https://cells-progress-tracker-gitlab-org-tenant-scale-g-f4ad96bf01d25f.gitlab.io/sharding_keys).

If you are creating a new issue, you can copy over the following contents to the issue description:

<details><summary>Click to expand</summary>

  Issue Title: Set sharding keys for tables in 'group::geo'

  Issue Description:

  Sharding keys need to be set for the tables: geo_cache_invalidation_events, geo_event_log, geo_events, geo_node_statuses, geo_nodes, geo_repositories_changed_events, group_wiki_repository_states, merge_request_diff_details, upload_states

  This involves choosing one of the following, based on the intended behaviour of the table:
  - **The table is not cell-local**
    - Set `gitlab_schema` to `gitlab_main_clusterwide`.
  - **The table is cell-local and requires a sharding key**
    - Set `gitlab_schema` to `gitlab_main_cell`
    - Add a `sharding_key` or `desired_sharding_key` configuration. If the configuration is known but the chosen key
      doesn't yet meet not-null and foreign key requirements, you can add an exception to
      `allowed_to_be_missing_not_null` or `allowed_to_be_missing_foreign_key` to get the pipeline passing. Please
      link to a follow-up issue in a code comment next to the exception.
    - You may also need to set `allow_cross_joins`, `allow_cross_transactions` and `allow_cross_foreign_keys` if changing
      the schema causes pipeline failures. See [`db/docs/epics.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/docs/epics.yml?ref_type=heads#L12-17)
      for an example.
  - **The table is cell-local and does not require a sharding key**
    - Set `gitlab_schema` to `gitlab_main_cell` and
    - Set `exempt_from_sharding` to `true`.

  ### Documentation

  - [Choosing either the gitlab_main_cell or gitlab_main_clusterwide schema](https://docs.gitlab.com/ee/development/database/multiple_databases.html#choose-either-the-gitlab_main_cell-or-gitlab_main_clusterwide-schema)
  - [Defining a sharding key for all cell-local tables](https://docs.gitlab.com/ee/development/database/multiple_databases.html#defining-a-sharding-key-for-all-cell-local-tables)
  - [Defining a desired_sharding_key to automatically backfill a sharding_key](https://docs.gitlab.com/ee/development/database/multiple_databases.html#define-a-desired_sharding_key-to-automatically-backfill-a-sharding_key)

</details>

### Option 2: Add sharding key configuration

This option is best suited to tables with an easily identifiable sharding key that will require minimal work to
define.

Remove `sharding_key_issue_url` from the dictionary file and instead complete the classification for the table.
This involves choosing one of the following, based on the intended behaviour of the table:
- **The table is not cell-local**
  - Set `gitlab_schema` to `gitlab_main_clusterwide`.
- **The table is cell-local and requires a sharding key**
  - Set `gitlab_schema` to `gitlab_main_cell`
  - Add a `sharding_key` or `desired_sharding_key` configuration. If the configuration is known but the chosen key
    doesn't yet meet not-null and foreign key requirements, you can add an exception to
    `allowed_to_be_missing_not_null` or `allowed_to_be_missing_foreign_key` to get the pipeline passing. Please
    link to a follow-up issue in a code comment next to the exception.
  - You may also need to set `allow_cross_joins`, `allow_cross_transactions` and `allow_cross_foreign_keys` if changing
    the schema causes pipeline failures. See [`db/docs/epics.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/docs/epics.yml?ref_type=heads#L12-17)
    for an example.
- **The table is cell-local and does not require a sharding key**
  - Set `gitlab_schema` to `gitlab_main_cell` and
  - Set `exempt_from_sharding` to `true`.

### Documentation

- [Choosing either the gitlab_main_cell or gitlab_main_clusterwide schema](https://docs.gitlab.com/ee/development/database/multiple_databases.html#choose-either-the-gitlab_main_cell-or-gitlab_main_clusterwide-schema)
- [Defining a sharding key for all cell-local tables](https://docs.gitlab.com/ee/development/database/multiple_databases.html#defining-a-sharding-key-for-all-cell-local-tables)
- [Defining a desired_sharding_key to automatically backfill a sharding_key](https://docs.gitlab.com/ee/development/database/multiple_databases.html#define-a-desired_sharding_key-to-automatically-backfill-a-sharding_key)

Related to #455137

This change was generated by
[gitlab-housekeeper](https://gitlab.com/gitlab-org/gitlab/-/tree/master/gems/gitlab-housekeeper)
using the Keeps::AddShardingKeyTrackingIssues keep.

To provide feedback on your experience with `gitlab-housekeeper` please comment in
<#442003>.

Changelog: other
parent be70897b
No related branches found
No related tags found
2 merge requests!162233Draft: Script to update Topology Service Gem,!152829Add sharding key tracking issues for geo_replication
......@@ -4,7 +4,9 @@ classes:
- Geo::CacheInvalidationEvent
feature_categories:
- geo_replication
description: Geo event to process feature flag toggles instantly on a secondary by invalidating the cache, belongs to geo_event_log.
description: Geo event to process feature flag toggles instantly on a secondary by
invalidating the cache, belongs to geo_event_log.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/7738
milestone: '11.4'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
exempt_from_sharding: true # Not org-specific
......@@ -4,7 +4,11 @@ classes:
- Geo::EventLog
feature_categories:
- geo_replication
description: Log of all events that a Geo secondary can process. Parsed/watched through streaming replication on all secondaries.
description: Log of all events that a Geo secondary can process. Parsed/watched through
streaming replication on all secondaries.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/commit/cb6c7cbe2a9ee05cea6926e3d8c18f6aa26f4c64
milestone: '9.3'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
# This table is a log of all events that occur on a primary Geo node/site,
# which a secondary might be interested in.
exempt_from_sharding: true
......@@ -4,7 +4,9 @@ classes:
- Geo::Event
feature_categories:
- geo_replication
description: Geo events implemented generically, used by the SSF where all object types can generate an event to be processed by the secondary sites.
description: Geo events implemented generically, used by the SSF where all object
types can generate an event to be processed by the secondary sites.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/23447
milestone: '12.8'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
sharding_key_issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/464440
......@@ -4,7 +4,9 @@ classes:
- GeoNodeStatus
feature_categories:
- geo_replication
description: Contains sites status and metadata for each Geo site, updated async through a scheduled worker.
description: Contains sites status and metadata for each Geo site, updated async through
a scheduled worker.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/3230
milestone: '10.2'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
exempt_from_sharding: true # Geo nodes are an infrastructure-level concern, and are not org-specific.
......@@ -7,4 +7,5 @@ feature_categories:
description: Contains Geo sites configuration data and settings.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/commit/5ab12ad02ed753dd933485094ba45512890f0b50
milestone: '8.5'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
exempt_from_sharding: true # Geo nodes are an infrastructure-level concern, and are not org-specific.
......@@ -4,7 +4,9 @@ classes:
- Geo::RepositoriesChangedEvent
feature_categories:
- geo_replication
description: Geo event for when the repositories for selective sync of a specific Geo secondary change, belongs to geo_event_log.
description: Geo event for when the repositories for selective sync of a specific
Geo secondary change, belongs to geo_event_log.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/commit/312bc703a4619b87ba2ac4e59623e7747a24502c
milestone: '9.5'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
sharding_key_issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/464364 # this table will be deleted soon
......@@ -7,4 +7,15 @@ feature_categories:
- geo_replication
classes:
- Geo::GroupWikiRepositoryState
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
sharding_key_issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/465224
# desired_sharding_key_spec.rb assumes the parent table's primary key is `id`
# desired_sharding_key:
# group_id:
# references: namespaces
# backfill_via:
# parent:
# foreign_key: group_wiki_repository_id
# table: group_wiki_repositories
# sharding_key: group_id
# belongs_to: group_wiki_repository
......@@ -7,4 +7,13 @@ feature_categories:
description: External MR diff replication detail
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34248
milestone: '13.4'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
desired_sharding_key:
project_id:
references: projects
backfill_via:
parent:
foreign_key: merge_request_diff_id
table: merge_request_diffs
sharding_key: project_id
belongs_to: merge_request_diff
......@@ -7,4 +7,5 @@ feature_categories:
description: Separate table for uploads containing Geo verification metadata.
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/65921
milestone: '14.6'
gitlab_schema: gitlab_main
gitlab_schema: gitlab_main_cell
sharding_key_issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/464440 # Blocked on sharding the uploads table
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment