Skip to content

[Feature flag] Enable Praefect generated replica paths

What

gitaly_praefect_generated_replica_paths introduced via !4101 (merged) changes the path the repositories are stored on the disk in Gitaly Cluster. Instead of using the client provided relative path, Praefect derives a unique path for each repository using the repository ID. This ensures that all repositories always land in different directories on the disk, which allows Praefect to perform creations, deletes and renames atomically.

Once the flag is enabled, new repositories will be stored in Praefect controlled paths. They'll remain there even if the flag is turned off. The atomic rename handling is used for repositories which have have a Praefect generated path regardless of the flag value.

Owners

  • Team: Gitaly
  • Most appropriate slack channel to reach out to: #g_create_gitaly
  • Best individual to reach out to: @samihiltunen

Expectations

Newly created repositories will have their replica path set to a Praefect controlled path in the @cluster/pools/<xx>/<xx>/<id> for object pools and @cluster/repositories/<xx>/<xx>/<id> for other repositories. This is the only externally visible change from Praefect.

Renames, creations and deletions should function atomically for these repositories, although one may not really observe this externally.

What release does this feature occur in first?

%15.0

What are we expecting to happen?

New repositories end up in new the new Praefect controlled paths. Everything should work as usual.

What might happen if this goes wrong?

Mostly NotFound problems of all sorts if the path handling is not updated everywhere to take into account the location of the repository may not actually be what the client sent.

What can we monitor to detect problems with this?

Given the scope of this change, the best place to monitor this is the apdex of Praefect as it gives an overview:

https://dashboards.gitlab.net/d/praefect-main/praefect-overview?orgId=1

We should also keep an eye on the logs and inspect for increases in error:

https://log.gprd.gitlab.net/app/discover#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-1h,to:now))&_a=(columns:!(json.grpc.code,json.grpc.method,json.error),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:AW98WAQvqthdGjPJ8jTY,key:json.grpc.code,negate:!t,params:(query:OK),type:phrase),query:(match_phrase:(json.grpc.code:OK)))),index:AW98WAQvqthdGjPJ8jTY,interval:auto,query:(language:lucene,query:'NOT%20json.grpc.code:%22OK%22'),sort:!(!(json.time,desc)))

Roll Out Steps

  • Enable on staging
  • Enable on production
  • Default-enable the feature flag (optional, only required if backwards-compatibility concerns exist)
    • Wait for release containg default-disabled feature flag.
    • Change the feature flag to default-enabled (howto)
    • Wait for release containing default-enabled feature flag.
  • Remove feature flag
    • Remove the feature flag and the pre-feature-flag code (howto)
    • Remove the feature flag via chatops (howto)
    • Close this issue

Please refer to the documentation of feature flags for further information.

Edited by Sami Hiltunen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information