What

We measure the time between the moment a simulation request arrives and the moment the execution starts, and the size of the waiting queue.

Why

We want a more direct way to assess if the node is struggling to process the incoming requests. Right now the only measure we have is the average RPC response time, which is an indirect measure at best.

How

There is a Lwt_pool used in the evm_context to limit the number of threads available for simulation. We can measure the time between the moment it's included in the queue and the moment it's actually executed. We use a prometheus counter, which will allow us to measure the total time spent waiting but also the rate.

We use Ptime to measure the time spent in picoseconds.

Manually testing the MR

Setup a sandbox, and then blast it with simulations (here a random eth_estimateGas straight from the spec). To stress test it's enough to start a lot of even simple requests, so let's put the following in a bash script and use parallel to use as many cores as possible.

cat << EOF > tmp
while true; do 
    curl http://localhost:8545/ \
        -X POST \
        -H "Content-Type: application/json" \
        --data '{"method":"eth_estimateGas","params":[{"from":"0x8D97689C9818892B700e27F316cc3E41e17fBeb9","to":"0xd3CdA913deB6f67967B99D67aCDFa1712C293601","value":"0x186a0"}],"id":1,"jsonrpc":"2.0"}'
done
EOF

make the script executable and start a few in parallel

chmod a+x tmp
seq 1 32 | parallel --progress ./tmp

and look at the metric(s) increasing

while true; do curl http://localhost:8545/metrics -s | grep time_waiting; done
while true; do curl http://localhost:8545/metrics -s | grep queue_size; done

Checklist

Document the interface of any function added or modified (see the coding guidelines)
Document any change to the user interface, including configuration parameters (see node configuration)
Provide automatic testing (see the testing guide).
For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
Select suitable reviewers using the Reviewers field below.
Select as Assignee the next person who should take action on that MR

Edited Oct 08, 2024 by Pierre-Emmanuel CORNILLEAU

EVM/Node: add metrics for time spent waiting for a thread

What

Why

How

Manually testing the MR

Checklist

Merge request reports