Draft: CI: Marge-bot on steroids (MBOSS)
Context
The goal of this MR was to test the impact of upping the specs of the executors we use to run CI jobs. In particular, we would like to improve the pipelines of marge-bot to improve the MR throughput. As a quick experiment, we ran some pipelines with CI executors with doubled specs compared to our normal executors.
We went from the machine
C5 High-CPU Extra Large c5d.xlarge 8.0 GiB 4 vCPUs 100 GB NVMe SSD Up to 10 Gigabit $0.1920 hourly
(this is what is currently used in the tezos/tezos CI)
to:
C5 High-CPU Double Extra Large c5d.2xlarge 16.0 GiB 8 vCPUs 200 GB NVMe SSD Up to 10 Gigabit $0.3840 hourly
So we doubled the number of cores and the RAM.
Dune should automatically pick up the new CPUs and use them to
parallelize builds. In addition, we played around with changing the
-j
argument to tezt to see how more parallelization impacted runtime
of the tezt jobs.
Performance impact
For an analysis of the performance impact, ses https://hackmd.io/v13k2k_1RfqU6dZercyWyw?view.
In short:
- The baseline MR has a wall-time of approximately 30 minutes. With MBOSS, we get pipelines of 24 minutes on average, ranging from 20 to 28 minutes depending on the value of -j passed to Tezt. The base value seems to be -j 6, with which we got the wall-times of 20 and 26 minutes.
- If we look in particular at the job build_x86_64, we see below that it’s duration goes from 11 to 8, 7, and 5 minutes. However, if we look at the logs of the job we see that the actual build (the section step_script) is always around ~5 minutes. In the baseline pipelines, it is around 9-10 minutes.
Estimate impact on costs
- In february, there were 3807 pipelines in total in tezos/tezos. Only 603 of these pipelines were actually triggered. Out of these, 161 were triggered by margebot. If we assume that a "marge-bot on steroids" pipeline costs 2x the cost of a normal pipeline, this will increase the cost with 161/603 = 27%.
Manually testing the MR
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rst
for the protocol and the environment,CHANGES.rst
at the root of the repository for everything else). -
Select suitable reviewers using the Reviewers
field below. -
Select as Assignee
the next person who should take action on that MR