Design of initial data for performance testbed and test data created from test runs

Overview

For our on-prem performance testbed, we should have a way to draw/separate static data and incremental data generated from test.

Static data should be the baseline. This is to ensure that as times progresses we are against a consistent database shape and that our data is not continously growing. If the database is getting bigger performance from today compared in the next 6-12 months may be different and improvements/degradation is not an accurate comparison with results prior.

Static data: This is initial data setup in the database. It should be static and large enough to satisfy the sql data shapes that we need.
- Will be setup in a seed state and will be static
- Will be use in a read fashion. Tests will not created more data into these projects and groups.
Incremental data These are data generated dynamically from our test runs. Data here is mainly used for traffic generation and functional usage load.
- New projects, groups, issues, merge requests that gets created or setup as part of an automated test
- This data should be in a separate project / group from the above
- Allow easy delete/drop to maintain data size baseline.

An simple outline is shown below.

Task

Identify Static data needed, this can be setup via project import. These are meant to be static so cleanup is not a priority.
Import data into the environment
Setup sandbox area for incremental data generated from tests
Ensure that tests are outputing data/artifacts into this sandbox area (group/project)
Delete/drop data mechanism. We might want to consider having the sandbox data on a separate db shard so we can just drop them. Deleting will still keep the data in the database hence the database shape will still be bigger.

@at.ramya @sliaquat @stanhu @ayufan this is the issue for the design of the test data. Let's use this as the starting point.

Edited Mar 21, 2019 by Eric Brinkman