Define the process for performance test comparison between GitLab versions

We need to define a test process for comparing performance between two sequential GitLab releases. e.g. 12.0->12.1.

Defined test process

Given that:

We want to rerun the tests for all desired comparison versions after each release to pick up any improvements to the performance test suite itself.
We want to automate as much of this as possible.
- Examples: we don't want to have to upgrade an environment ourselves, kick off the tests, upgrade again, kick them off again, etc. We also do not want to manually calculate the percentage differences, like we do now (a fine v1, but not a fine v2).
We want the results to be transparent: available to the wider community and easy to find.

Our test process for comparing performance between two sequential GitLab releases (e.g. 12.0 -> 12.1) will be:

Every month after a new release, we will edit a file within the performance project that contains all the image versions we will want to compare.
We will then kick off a pipeline which will run the first specified version in a docker, test it, and log the results to CSV.
This will happen for every version we originally listed.
1. Can this be done in parallel jobs, rather than within the same docker container upgraded between each run?
In the end, we will have a directory with a handful of results CSVs.
The pipeline would then run a job that will execute a small script we will write to compare sequential versions. It will step through all the desired comparisons, reading the two appropriate CSV files and outputting CSV files with the percentage comparisons in it.
After that, we will have another directory with a handful of comparison results CSVs.
To share the results with the public, we could make a monthly blog post that would go live within a couple days of a release. It would be a manual process but wonderful visibility for our efforts, and we could include discussions around what we improved over that past month in the performance suite, what investigations the results have launched (slow endpoints, etc), and what the next steps are for the coming month.
1. We might also consider updating some spot in the docs every month with this information?
If we notice a performance degradation, we will:
1. Notify relevant stakeholders for GitLab.com and self-managed instances.
  1. Infrastructure should be aware to prepare mitigation on GitLab.com.
  2. TAMs or Support should be aware so that self-manage customers may hold off on upgrading.
2. Define mitigation steps around which team should fix the degradation and in which patch release it will go out.
3. When the fix has been made, close the loop with stakeholders so that upgrade paths and when degradation on GitLab.com will improve are clear.

High level questions

How automated is appropriately automated?
How do we aggregate the results per version, not just internally but in a customer-facing place?
How do we display the comparison, not just internally but in a customer-facing place?

Brainstorming

the tests can be rerun monthly with whatever our latest perf framework iteration is with, assumingly, better coverage and scenarios
a CI job that runs one version in a docker, tests it, runs the next version and then tests that before going onto creating a % result table
working under the assumption that upgrading \ downgrading our static environments won’t be easy or automatable
with this setup, performance doesn’t actually matter… all we care about is the % difference between two version dockers running on the same runner
run these once and then build something to compare automatically given the result files vs running+comparing on the fly
scripting work there as we extract the values out of the two runs but they’ll both be on the filesystem so it’ll be doable
once the numbers are extracted there’s various command line graphing tools out there
we could write a very small script that reads two csv files and outputs another with the percentages in there

Edited Aug 26, 2019 by Tanya Pazitny