FY26-Q4 Tenant Scale Weekly Updates

Latest Update (2025-11-07)

Help Needed / Challenges

  • groupgitaly Rivian continues to push Gitaly beyond its limits, and Gitaly team is getting requests for deeper involvement in debugging their performance issues. Dedicating more engineering resources could help surface opportunities for improvement for more ultra-scale customers, but it would mean adjusting the timeline for RAFT since the team is at capacity.

📕 To Be Closed

grouptenant services

🌟 Highlights

grouptenant services

  • Put together a presentation about the team and what we will be doing in Q4.
  • Met with @amyphillips to engage DevEx's help with the challenge identified last week for valkey support

groupgit

  • We have implemented a working proof of concept for pluggable object databases. This proof of concept uses MongoDB to store its objects and many of its parts already work as expected: you can use this repository for local workflows already, but there are still limitations. Using MongoDB is not meant as an endorsement of the technology, but was rather chosen for ease of implementation.

groupgitaly

  • Transactions benchmarking made another key discovery that there is a clear inflection point around 40 RPS for response time. Interestingly, in this scenario transactions only account for 10% of the latency. This surfaces an opportunity to fix a previously unknown scale bottleneck.
  • RAFT Routing work has begun with the first MR merged that ensures the routing table is cleaned up after any failures.

groupgeo

groupcells infrastructure

Organization Path Claiming - Happy Path Complete 🎉

grouporganizations


Previous Updates

2025-11-07

Help Needed / Challenges

  • FYI - We are still aligning on a proposal for how to proceed after rolling back the feature flag to disable forced deletions. Progress has been slower with the Dublin product offsite taking place this week.

📕 To Be Closed

groupgit

  • Native tool in Git to gather repository metrics... (gitlab-org&18040 - closed). We have upstreamed git repo structure into Git, which will evolve into a native replacement for git-sizer(1). The new tool will be used to feed dashboards built into GitLab that surface information around a repository's structure and health. This allows support and customers to more readily debug slow repositories and should thus help reduce the support load for Git and Gitaly.

🌟 Highlights

grouporganizations

groupcells infrastructure

groupgeo

  • Work on improving primary verification experience is progressing well. The last endpoint to the API was added, making it possible to use the API to recalculate the primary checksum for failed models only. This widely requested feature will enable SREs and customers to proactively and quickly resolve primary data corruption, especially before Dedicated migrations.
  • Protocells Org Mover selective sync: 1 more data type was merged (DependencyProxy::Blob). 6 of 14 data types have been completed, with 4 more in review/in dev.

groupgit

  • We have landed a change in Gitaly that starts to use git-last-modified(1). This tool has been upstreamed by us to address an N+1 problem that we face in Gitaly in ListLastCommitsForTree(). This RPC is executed whenever one navigates to the "Files" overview of a repository and spawns a separate process for each of the files. Early benchmarks show 3x improvements, but we expect even better results in production.

groupgitaly

  • Benchmarking continues to provide actionable insights. OverlayFS was tested to scale much better than the deepclone method of taking repository snapshots (enabling the WAL for RAFT). In the below graphic, snapshot latency is quadratic starting when repositories have 4k files, but OverlayFS shows much flatter latency growth. While more investigation is required, this data gives us confidence when considering different performance optimization strategies. image

grouptenant services

Edited by Nick Nguyen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information