Properly use the ddb fetch mechanism and fix data-race in the mempool
Context
This MR fixes and optimizes mempool's DDB request. In particular, there was a data-race that made the mempool create on_arrived
worker's requests more than once per resource which would result in useless computation to determine if this operation was already handled or not. With this MR, we now only create one on_arrived
request per resource which, incidentally, makes the already_handled
check useless in the on_arrived
branch as it was already done (at least once) when we would treat the on_notify
request.
This MR also ensures that we add all peers that advertised a resource to the set of "requestable" peers in case the first one fails to provide the resource instead of considering only one peer. This fixes a potential attack vector where the first peer that would advertise could decide to never answer and would result in censoring the resource. In practice, however, the previous data-race made that, while waiting for the on_arrived
request to be processed, the on_notify
would still call fetch_operation
to all new advertising peers but only when the resource was already received.
Here are two profiling reports of the mempool activity during a block lifetime (on my laptop). These results were used to measure improvements and testing that the semantics is sound:
Before patch
head:BLJ1TQdcqF8m .............................................................. 1 13971.915ms 15% +2m43s496.972ms
handle_unprocessed ........................................................... 6477 227.648ms 100%
operation classified ....................................................... 853
on_advertise ................................................................. 52 5.064ms 101%
on_arrived ................................................................... 4117 22.446ms 101%
is already handled ......................................................... 4117 9.255ms 99%
already handled .......................................................... 3689
not already handled ...................................................... 428
on_flush ..................................................................... 1 6.466ms 100%
on_notify .................................................................... 2299 692.823ms 100%
may_fetch_operation ........................................................ 29723 652.991ms 100%
already handled .......................................................... 25605
fetching thread .......................................................... 4118 460.851ms 100%
is already handled ....................................................... 29723 133.019ms 101%
already handled ........................................................ 25605
not already handled .................................................... 4118
After patch
head:BLavSxsSKWfi .............................................................. 1 15504.976ms 15% +1m19s421.360ms
handle_unprocessed ........................................................... 2863 251.043ms 97%
operation classified ....................................................... 858
on_advertise ................................................................. 80 8.067ms 101%
on_arrived ................................................................... 420 7.981ms 101%
on_flush ..................................................................... 1 4.291ms 100%
on_notify .................................................................... 2354 638.081ms 101%
may_fetch_operation ........................................................ 25072 593.663ms 101%
already handled .......................................................... 21342
fetching thread .......................................................... 3317 362.849ms 100%
is already handled ....................................................... 21755 110.184ms 101%
already handled ........................................................ 21342
not already handled .................................................... 413
not being fetched - not already handled .................................. 413 53.596ms 101%
fetching thread ........................................................ 413 52.858ms 100%
We notice that the performances are slightly better after the patch but the real gain is the significantly lower amount of worker requests (divided by 3) which could make a less performant machine struggle to process (the worker loops are not profiled yet so the potential gain are not visible). We can also see that the amount of on_arrived
requests amounts to exactly the number of expected operations during a block lifetime.
Manually testing the MR
CI
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rst
for the protocol and the environment,CHANGES.rst
at the root of the repository for everything else). -
Select suitable reviewers using the Reviewers
field below. -
Select as Assignee
the next person who should take action on that MR