Pre-compile test data sets and update our guides
Currently, we document semi-manual, semi-automatic method for generating test data: https://gitlab.com/gitlab-org/quality/performance/#2-data-for-load-tests.
We should rather update that to:
- Use import/export feature that is pre-generated and distributed as part of this repository,
- Ask to import test data into GitLab instance that is tied to given GitLab version (as import version is GitLab-version-specific),
- Version with added features different import versions,
- Align all our environments to use a specific version of imported data, ideally, the test suite could import data before starting to ensure that everything behaves exactly the same.
Import/export does not allow us to model everything:
- We cannot model multiple groups/projects, but maybe we can import the same import multiple times to present the expected size of groups and projects,
- Measure the performance of import itself as part of our testing suite,
- Does not include job logs, etc., but does include Pipelines and Jobs itself, so we should be able to validate most of CI features,
- Makes it super easy to import the data set into everyone GitLab instance without very complex schema and commands to execute, making it super fast for everyone to start performance testing.
Ideally, we could think about different data sets that would import/exported:
- Small: for testing small scale projects,
- Large: GitLab CE repository? for testing complex projects with multiple issues.
Validating different sizes against a single environment would allow us to answer how features of GitLab scales with the size of the project. We should expect the performance to degrade, but this degradation should be fractional rather than 10-20 fold.
Having multiple data sets allows us to better model different sizes of projects on GitLab-instance.
Edited by Kamil Trzciński