Proposal: Microbenchmarks (sysbench, fio, etc.) to use when provisioning a new GCE instance

Postgres benchmarks described in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13029 for Postgres 11 / Ubuntu 16.04 / n1 (Intel) vs. Postgres 12 / Ubuntu 18.04 / n2d (AMD) showed a significant difference in performance and eventually were analyzed without Postgres since they were related to the difference in GCE SSD PD performance for n1 vs. n2d.

I propose:

  • developing a suite of microbenchmarks (involving, for example, sysbench, fio) that will be executed each time an instance is provisioned, checking CPU, RAM, disk IO, network;
  • building the list of expected results for various instances, to use as reference. If new instance behaves not as expected, we could either discard it or perform additional analysis manually.

This would allow us:

  • to understand problems earlier during various changes such as upgrades (including switching to new instances),
  • quickly detect issues with a particular instance, right after it's provisioned and before we start using it – it's not uncommon when the performance of two instances of the same type is significantly different.

If such tooling already exists, I think it's worth bringing it to the database part of the infrastructure.