Design of initial data for performance testbed and test data created from test runs

Overview

For our on-prem performance testbed, we should have a way to draw/separate static data and incremental data generated from test.

Static data should be the baseline. This is to ensure that as times progresses we are against a consistent database shape and that our data is not continously growing. If the database is getting bigger performance from today compared in the next 6-12 months may be different and improvements/degradation is not an accurate comparison with results prior.

  • Static data: This is initial data setup in the database. It should be static and large enough to satisfy the sql data shapes that we need.
    • Will be setup in a seed state and will be static
    • Will be use in a read fashion. Tests will not created more data into these projects and groups.
  • Incremental data These are data generated dynamically from our test runs. Data here is mainly used for traffic generation and functional usage load.
    • New projects, groups, issues, merge requests that gets created or setup as part of an automated test
    • This data should be in a separate project / group from the above
    • Allow easy delete/drop to maintain data size baseline.

An simple outline is shown below.

Testdata_design__1_

Task

  • Identify Static data needed, this can be setup via project import. These are meant to be static so cleanup is not a priority.
  • Import data into the environment
  • Setup sandbox area for incremental data generated from tests
  • Ensure that tests are outputing data/artifacts into this sandbox area (group/project)
  • Delete/drop data mechanism. We might want to consider having the sandbox data on a separate db shard so we can just drop them. Deleting will still keep the data in the database hence the database shape will still be bigger.

@at.ramya @sliaquat @stanhu @ayufan this is the issue for the design of the test data. Let's use this as the starting point.

Edited by Eric Brinkman