Skip to content

Enable transactions for last batch of Gitaly RPCs

Production Change

Change Summary

In order to achieve strong consistency in Gitaly, we have introduced transactions via Praefect. All Gitaly nodes taking part in a transaction will perform a vote on what they think the result of a given git operation should be -- if the vote succeeds, they commit the git operation, otherwise it's rejected.

Transactions have been tested during the last few months as a subset of our RPCs have them enabled right now. This is the fourth and last batch of RPCs:

  • ResolveConflicts: allows a user to provide conflict resolutions for a conflicting MR
  • UserApplyPatch: mostly used to apply patches sent via mail
  • UserCherryPick: cherry pick commits
  • UserCommitFiles: create commits based on a set of actions, mostly used for the Web IDE
  • UserFFBranch: perform a fast-forward merge, used when merging fast-forward-only MRs
  • UserMergeBranch: merge a branch, used when merging MRs
  • UserMergeToRef: merge with a given ref, used to compute mergeability of an MR
  • UserRebaseConfirmable: rebase a branch, used e.g. for the /rebase action of an MR
  • UserRevert: revert a given commit, can be triggered via the web UI
  • UserSquash: squash a range of commits into a single one, used for squash-merging MRs
  • UserUpdateSubmodule: used to update a submodule to point to a different version

All of the above RPCs have been enabled in staging for months now. No failures were observed.

Due to all of these RPCs being directly user-facing, the risk is somewhat higher than in the previous batches. But that's mostly why I have this batch of RPCs last, such that early issues with transactional behaviour could've been weeded out before going to this higher-risk set of RPCs. I'll also put more monitoring time into each of the feature flags to be able to react quickly in case anything goes wrong.

Note that this only has an effect for repositories which are hosted by Praefect. This currently includes the gitlab-org group and a few thousand other repos. The feature flags have all been enabled in staging since December 8th.

Part of gitlab-org/gitaly#3310 (closed)

Change Details

  1. Services Impacted - Gitaly, Praefect
  2. Change Technician - @pks-t
  3. Change Criticality - C3
  4. Change Type - changeunscheduled
  5. Change Reviewer -
  6. Due Date - February 24th, 7:30 UTC
  7. Time tracking - 132 minutes

Detailed steps for the change

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 1 minute per RPC

  • /chatops run feature set gitaly_tx_resolve_conflicts true
  • /chatops run feature set gitaly_tx_user_apply_patch true
  • /chatops run feature set gitaly_tx_user_cherry_pick true
  • /chatops run feature set gitaly_tx_user_commit_files true
  • /chatops run feature set gitaly_tx_user_ff_branch true
  • /chatops run feature set gitaly_tx_user_merge_branch true
  • /chatops run feature set gitaly_tx_user_merge_to_ref true
  • /chatops run feature set gitaly_tx_user_rebase_confirmable true
  • /chatops run feature set gitaly_tx_user_revert true
  • /chatops run feature set gitaly_tx_user_squash true
  • /chatops run feature set gitaly_tx_user_update_submodule true

Post-Change Steps - steps to take to verify the change

Estimated Time to Complete (mins) - 10 minutes per RPC

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 1 minute per RPC

  • /chatops run feature set gitaly_tx_resolve_conflicts false
  • /chatops run feature set gitaly_tx_user_apply_patch false
  • /chatops run feature set gitaly_tx_user_cherry_pick false
  • /chatops run feature set gitaly_tx_user_commit_files false
  • /chatops run feature set gitaly_tx_user_ff_branch false
  • /chatops run feature set gitaly_tx_user_merge_branch false
  • /chatops run feature set gitaly_tx_user_merge_to_ref false
  • /chatops run feature set gitaly_tx_user_rebase_confirmable false
  • /chatops run feature set gitaly_tx_user_revert false
  • /chatops run feature set gitaly_tx_user_squash false
  • /chatops run feature set gitaly_tx_user_update_submodule false

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • Does this change introduce new compute instances?
  • Does this change re-size any existing compute instances?
  • Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.
Edited by Patrick Steinhardt