Benchmarking: Establish a consistent baseline state and workload

Context

As part of the benchmarking effort, we've been roughly running two types of experiments:

Microbenchmarks which exercise very specific operations to uncover a previously-observed or suspected bottleneck in the system. For example, calling WriteRef and DeleteRefs RPCs to trigger the creation of exclusive snapshots to measure contention in the filesystem:
- #6947
- #6963
Simulations which generate a workload that resembles production traffic, so we can measure the baseline and test the viability of alternative approaches to the "deepclone" snapshotting strategy:
- #6928
- #6932
- #6946
- #6930
- #6929
- #6944 (closed)
- #6931
- #6933

The results of the microbenchmarks are typically standalone, but the results of the simulation experiments for deepclone, OverlayFS, btrfs, and XFS should be directly comparable. As part of #6972, we need the results to quantitatively guide the approach we should next for transactions.

One concern is that we're starting experiments against repositories in different states, and we have different workload parameters. There have also been a few other suggestions on how the workload should be adjusted:

#6962 points out that housekeeping is not enabled by default as it's behind a feature flag. To emulate production, we should enable housekeeping.
#6961 mentions that a flat load profile would be more appropriate.

Proposal

We should decide on:

What repositories should be tested, and how they should be sourced. Right now we're cloning public repositories which doesn't include keep-around refs.
What the workload definition is, i.e. the contents of the k6-benchmark.js
The configuration of the client and Gitaly machines, i.e. the contents of config.yml

Before we populate the table in #6972, we need to ensure that experiments are being executed with the same parameters.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information