Py2ml: translate `test_tenderbake_long_dynamic_bake.py` to Tezt
Context
Part of #3628 (closed).
Depends on !5245 (merged).
Notes on git history: i first add a small function to tezt, then i refactor in 3 commits the original test of this file (added in !5245 (merged)). Then, i add the new test Long_dynamic_bake
. Finally, the history contains a last commit inlining the code of the original test to ease review.
Description of original test
Run a network of nodes 5 and 3 bakers. Every 2 seconds, a random operation is generated. Every 20 seconds, a client verifies that the chain progresses (using polling with a timeout). It checks that the network has reached consensus on each level in less than or exactly 3 rounds. A final check verifies that the five nodes have reached a certain level (based on the duration of the test and the minimal block delay) and greatest common level, they all have the same head.
Description of translation
In the translated version, I work with levels instead of timeouts:
(* This test runs [num_nodes] nodes for [test_levels]. All but one
node has a baker attached. Every [kill_baker_period], a baker is
added to the node lacking one and one of the bakers is killed.
In addition, every [add_operation_period] level, a random
transaction is injected.
We measure and check for regressions in the time it takes the
cluster to reach the final level [max_level]. *)
I did not implement the final fork check, as discussed below.
Protocol particularities
None.
Similarities other tests
Quite similar to test_tenderbake.py
Flakiness
Examples:
- https://gitlab.com/tezos/tezos/-/jobs/2528440487
- https://gitlab.com/tezos/tezos/-/jobs/2537087752
- https://gitlab.com/tezos/tezos/-/jobs/2556011954
- https://gitlab.com/tezos/tezos/-/jobs/2594907353
- https://gitlab.com/tezos/tezos/-/jobs/2928362854
The issue seems to appear in the ring topology, which is a bit weird (why not in the other topology.)
The issues seems quite diverse:
- sometimes too many operations are inserted on the same level, hitting the 1 manager operation per manager per block issue: Only one manager operation per manager per block allowed (https://gitlab.com/tezos/tezos/-/jobs/2594907353)
- sometimes the nodes ring topology is not established (https://gitlab.com/tezos/tezos/-/jobs/2537087752)
- sometimes the network does not progress: https://gitlab.com/tezos/tezos/-/jobs/2556011954#L57
- sometimes the network ends up in a fork? https://gitlab.com/tezos/tezos/-/jobs/2528440487#L73 and https://gitlab.com/tezos/tezos/-/jobs/2928362854#L117
I think the first issue can be resolved by more careful programming. The second issue will be resolved by using Tezt's cluster module, which I hope is more stable. The third issue will be solved by a move to long tests.
For issue 4, I simply did not implement the fork check. If a fork happens, then the network will progress more slowly. So it is captured as a regression in the time taken to reach test_levels
. It seems likes forks non-deterministically did happen in the original test, not sure what can be done about that.
Manually testing the MR
Without data visualization (requires docker-compose
):
alias long_tezt='dune exec tezt/long_tests/main.exe --'
long_tezt -f tenderbake.ml -m dynamic -i
With data visualization:
docker-compose -f tezt/lib_performance_regression/local-sandbox/docker-compose.yml up -d
alias long_tezt='TEZT_CONFIG=tezt/lib_performance_regression/local-sandbox/tezt_config.json dune exec tezt/long_tests/main.exe --'
long_tezt -f tenderbake.ml -m dynamic -i
then visit http://localhost:3000
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rst
for the protocol and the environment,CHANGES.rst
at the root of the repository for everything else). -
Select suitable reviewers using the Reviewers
field below. -
Select as Assignee
the next person who should take action on that MR