Skip to content

Draft: CI: Marge-bot on steroids (MBOSS)

Arvid Jakobsson requested to merge arvid@reserved-to-margebot into master

Context

The goal of this MR was to test the impact of upping the specs of the executors we use to run CI jobs. In particular, we would like to improve the pipelines of marge-bot to improve the MR throughput. As a quick experiment, we ran some pipelines with CI executors with doubled specs compared to our normal executors.

We went from the machine

C5 High-CPU Extra Large 	c5d.xlarge 	8.0 GiB 	4 vCPUs 	100 GB NVMe SSD 	Up to 10 Gigabit 	$0.1920 hourly

(this is what is currently used in the tezos/tezos CI)

to:

C5 High-CPU Double Extra Large 	c5d.2xlarge 	16.0 GiB 	8 vCPUs 	200 GB NVMe SSD 	Up to 10 Gigabit 	$0.3840 hourly

So we doubled the number of cores and the RAM.

Dune should automatically pick up the new CPUs and use them to parallelize builds. In addition, we played around with changing the -j argument to tezt to see how more parallelization impacted runtime of the tezt jobs.

Performance impact

For an analysis of the performance impact, ses https://hackmd.io/v13k2k_1RfqU6dZercyWyw?view.

In short:

  • The baseline MR has a wall-time of approximately 30 minutes. With MBOSS, we get pipelines of 24 minutes on average, ranging from 20 to 28 minutes depending on the value of -j passed to Tezt. The base value seems to be -j 6, with which we got the wall-times of 20 and 26 minutes.
  • If we look in particular at the job build_x86_64, we see below that it’s duration goes from 11 to 8, 7, and 5 minutes. However, if we look at the logs of the job we see that the actual build (the section step_script) is always around ~5 minutes. In the baseline pipelines, it is around 9-10 minutes.

Estimate impact on costs

  • In february, there were 3807 pipelines in total in tezos/tezos. Only 603 of these pipelines were actually triggered. Out of these, 161 were triggered by margebot. If we assume that a "marge-bot on steroids" pipeline costs 2x the cost of a normal pipeline, this will increase the cost with 161/603 = 27%.

Manually testing the MR

Checklist

  • Document the interface of any function added or modified (see the coding guidelines)
  • Document any change to the user interface, including configuration parameters (see node configuration)
  • Provide automatic testing (see the testing guide).
  • For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
  • Select suitable reviewers using the Reviewers field below.
  • Select as Assignee the next person who should take action on that MR
Edited by Arvid Jakobsson

Merge request reports