Enable transactions for Gitaly's RepositoryService
Production Change
Change Summary
In order to achieve strong consistency in Gitaly, we have introduced transactions via Praefect. All Gitaly nodes taking part in a transaction will perform a vote on what they think the result of a given git operation should be -- if the vote succeeds, they commit the git operation, otherwise it's rejected.
Transactions have been tested during the last few months as a subset of our RPCs have them unconditionally enabled right now. The next step is to enable transactions for more of Gitaly's RPCs, where the focus of this change request is the RepositoryService. More specifically, I want to enable transactions for the following RPCs:
- RepositoryService/CloneFromPool: create a new repository from an object pool
- RepositoryService/CloneFromPoolInternal: create a new repository from an object pool, but use an internal fetch
- RepositoryService/CreateFork: create a new fork
- RepositoryService/CreateRepositoryFromBundle: create a new repository from a git bundle
- RepositoryService/CreateRepositoryFromSnapshot: create a new repository from a snapshot
- RepositoryService/CreateRepositoryFromURL: create a new repository from an URL
- RepositoryService/FetchRemote: fetch into a repository from a remote repository
- RepositoryService/FetchSourceBranch: fetch a branch into a repository
- RepositoryService/ReplicateRepository: replicate a repository to another storage
- RepositoryService/WriteRef: update a git reference
Note that this only has an effect for repositories which are hosted by Praefect. This currently includes the gitlab-org group and a few thousand other repos. The feature flags have all been enabled in staging since December 8th.
Change Details
- Services Impacted - Gitaly
- Change Technician - @pks-t
- Change Criticality - C4
- Change Type - changescheduled
- Change Reviewer - @bjk-gitlab
- Due Date - January 27th, 10:30 UTC
- Time tracking - 15 minutes
- Downtime Component - none
Detailed steps for the change
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - 1 minute per RPC
-
RepositoryService/CloneFromPool: /chatops run feature set gitaly_tx_clone_from_pool true -
RepositoryService/CloneFromPoolInternal: /chatops run feature set gitaly_tx_clone_from_pool_internal true -
RepositoryService/CreateFork: /chatops run feature set gitaly_tx_create_fork true -
RepositoryService/CreateRepositoryFromBundle: /chatops run feature set gitaly_tx_create_repository_from_bundle true -
RepositoryService/CreateRepositoryFromSnapshot: /chatops run feature set gitaly_tx_create_repository_from_snapshot true -
RepositoryService/CreateRepositoryFromURL: /chatops run feature set gitaly_tx_create_repository_from_u_r_l true -
RepositoryService/FetchRemote: /chatops run feature set gitaly_tx_fetch_remote true -
RepositoryService/FetchSourceBranch: /chatops run feature set gitaly_tx_fetch_source_branch true -
RepositoryService/ReplicateRepository: /chatops run feature set gitaly_tx_replicate_repository true -
RepositoryService/WriteRef: /chatops run feature set gitaly_tx_write_ref true
Post-Change Steps - steps to take to verify the change
Estimated Time to Complete (mins) - 5 minutes per RPC
-
watch stats of the given RPC via https://dashboards.gitlab.net/d/000000199/gitaly-feature-status?orgId=1&refresh=30s -
watch transaction statistics in https://dashboards.gitlab.net/d/8EAXC-AWz/praefect?orgId=1&refresh=30s
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - 1 minute per RPC
-
RepositoryService/CloneFromPool: /chatops run feature set gitaly_tx_clone_from_pool false -
RepositoryService/CloneFromPoolInternal: /chatops run feature set gitaly_tx_clone_from_pool_internal false -
RepositoryService/CreateFork: /chatops run feature set gitaly_tx_create_fork false -
RepositoryService/CreateRepositoryFromBundle: /chatops run feature set gitaly_tx_create_repository_from_bundle false -
RepositoryService/CreateRepositoryFromSnapshot: /chatops run feature set gitaly_tx_create_repository_from_snapshot false -
RepositoryService/CreateRepositoryFromURL: /chatops run feature set gitaly_tx_create_repository_from_u_r_l false -
RepositoryService/FetchRemote: /chatops run feature set gitaly_tx_fetch_remote false -
RepositoryService/FetchSourceBranch: /chatops run feature set gitaly_tx_fetch_source_branch false -
RepositoryService/ReplicateRepository: /chatops run feature set gitaly_tx_replicate_repository false -
RepositoryService/WriteRef: /chatops run feature set gitaly_tx_write_ref false
Monitoring
Key metrics to observe
- Metric: per-RPC error rate
- Location: https://dashboards.gitlab.net/d/000000199/gitaly-feature-status
- What changes to this metric should prompt a rollback: decreasing SLA
- Metric: transaction statistics
- Location: https://dashboards.gitlab.net/d/8EAXC-AWz/praefect
- What changes to this metric should prompt a rollback: increase in aborted transactions
Changes checklist
-
This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities. -
This issue has the change technician as the assignee. -
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. -
Necessary approvals have been completed based on the Change Management Workflow. -
Change has been tested in staging and results noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncalland this issue and await their acknowledgement.) -
There are currently no active incidents.