Update Reference Architecture docs to increase awareness of Large Monorepo recommendations
As we continue to review how the Reference Architectures are doing for customers one clear pattern that's forming is the profound impact that Large Monorepos can have on performance.
This is a very complicated area that can be impacted by numerous factors and often in unique ways when combined, such as:
- Git repositories have various layers such as Commits, Refs, Files tracked, Large file blobs, Trees and more. When a customer has a large monorepo this is typically a unique repository based on the combination of those layers that has organically grown over time and typically misuse of Git (Large binary files, using Git more as a database, etc...)
- The usage shape against that monorepo is also a significant factor. If the usage is focused on the known weaknesses of Git such as causing Gitaly to pack references, constant concurrent clones, constant writes triggering Cluster replication or more then Gitaly can struggle significantly to keep up.
All of the above make this a notoriously a hard area to predict. It's also compounded by the fact the repositories like this are private and not available for us to test with and we're not able to replicate these organic repositories synthetically in the "lab".
This is also not solvable by hardware. The impact of the above can be profound and we've seen customers deploying the largest available node types and still seeing issues. This is caused by inherent software limitations and again is typically unique to the customer's repository and usage meaning giving general guidance in this area is very difficult.
Due to all of the above we've had a preventative strategy (reduce repo sizes, etc...) in place in the Reference Architecture documentation but it's clear on review that this isn't having the intended impact and customers are still deploying with their large monorepos.
To help increase awareness further we'll expand the Reference Architecture documentation as follows with more detailed information:
- Gitaly specs will now be given with strong guidance that the recommendations are based on normal-sized repositories and that for large monorepos additional specs will be likely but this depends on various factors. The overarching message will be for those customers to reach out to their representative or Support for specific guidance (for example there's various CI approaches one could take to reduce the load without needing to adjust specs)
- General guidance will be given for the following:
- A reaffirmation of the benefits of reduction as a preventative strategy.
- Increased environmental specs will be required if no reduction can take place.
- For extremely large monorepos (20 GB or more) then a separate Gitaly backend may be required.
- Network bandwidth will also need to be considered and specific environment adjustments may be required to compensate.
It's also worth calling out additional steps being taken for this area across the company. Notably that the Gitaly team are exploring a more in depth guide along with a potential tool that can analyse large monorepos and give rough requirements. If these come to fruition they will also be called out in the RA docs in full.