Proposal: Explore options to shift left server performance tests

A proposal I've been thinking about for a little while now is to to promote component performance testing higher up the test chain (or lower in the test pyramid). This would be more “unit” or “smoke” performance testing (both Browser and Server) that complements the current end of the chain integration "big bang" performance testing we do against the full reference architectures.

Update 2025-02-10

This issue was raised a long time ago and a lot of learnings have come from our ongoing efforts with Server Performance testing.

Shifting Server Performance Testing left is complicated and challenging. By it's nature it's suited to done towards the end of test chain where components can be stood up together as close to a real environment as possible with the same endpoints as customer's use being hit.

In that vein, we have a very comprehensive server performance test approach in place at the end of the test chain. This pipeline platform is designed to be as efficient and streamlined as possible and combined tests GitLab Performance, HA, Reference Architectures, GET and GPT all in one.

It does this daily against the Nightly Omnibus packages and offers decent performance test coverage as a result. Through this we can detect performance slips as they occur and find (or ask the relevant dev time to find) the recent MR manually that caused the slip. We've found dozens of performance issues over the years as a result.

Is it possible to do server performance testing earlier in the chain? Yes but it's difficult on a curve the more left you want to go. Server performance testing is naturally suited to be at the end with it's requirements of "real" environments, large test data, monitoring, etc...

To shift server performance testing left for more focused testing would require something like a test harness setup that the relevant dev team would need to own (as they would have the knowledge on how to set up their component in a dev standalone way). So for example Gitaly could be deployed on test code alone with mock test data and it's internal GRPc endpoints tested (note we test external endpoints like a customer). But this would need to be owned and maintained by the Gitaly team and would take quite a bit of investment to accomplish. It's more reachable I would say personally for our engineers to test if performance has changed with their code changes locally on the GDK, etc... but it depends on the context of course.

The goal here is to ensure accurate testing is performed ultimately. And this requires strict, repeatable conditions where every factor that could interfere with the test results is removed.

Edited Feb 10, 2025 by Grant Young