More Automated Testing of Gitlab Enumerations in the UI at SCALE, Before Release to Flesh out Performance Bugs
Problem to solve
Gitlab releases have frequently put performance of enumeration operations such as listing commits or branches at a severely degraded status - to the point of sometimes generating timeouts.
I feel such significant degradations should be considered a bug and be fleshed out before release.
Further details
We don't adopt the very latest version, so to some degree we can look ahead in release notes to try to see this stuff. However, I feel gitlab should consider it a release bug if an operation becomes much less performant than previously on the same given hardware.
This is a very visible failure to our customers.
In general, performance degradation of over 20% in a single release (on the same scaled configuration) should be considered a bug.
Proposal
- Construct test repositories that have so many of an item that they can function as a performance baseline for listing those in the GUI. This would be a set of repos that have a large number of commits, branches, merge history, file history, etc. Implementation Thought: It might be best to have a dedicated repository per measure so that the measures are as independent as possible to keep plausible causes permutations low.
- Run tests on a standardized size farm to measure differentials of enumeration performance between "last performance stable release" and the current.
- Also monitor logs during these tests for timeouts.
- If any given frequently used enumeration is exceeding a delta performance threshold, it should be considered an outright bug and fixed before release.
What does success look like, and how can we measure that?
Success would mean that in general I don't have to take a new Gitlab version or scale my implementation to make up for new performance bugs. It could ALSO mean that gitlab gives a "you will need to scale" warning when significant performance changes are known and considered unavoidable - but I would certainly hope that would be very rare and in general performance degradation of over 20% in a single release would be considered a bug.
We may have to do this ourselves if Gitlab were not to do it - but it sure seems like it would be much better to have the product prequalified in these ways before it goes out the door.
I understand that specific implementations will have a variety of performance characteristics and you may not be able to test on them all. Perhaps you could also request metrics feeds from customers who are willing to be early adopters?
/cc @aolson @jkrooswyk
/label ~rofni