EVM/Node: add metrics for time spent waiting for a thread
What
We measure the time between the moment a simulation request arrives and the moment the execution starts, and the size of the waiting queue.
Why
We want a more direct way to assess if the node is struggling to process the incoming requests. Right now the only measure we have is the average RPC response time, which is an indirect measure at best.
How
There is a Lwt_pool used in the evm_context to limit the number of threads available for simulation. We can measure the time between the moment it's included in the queue and the moment it's actually executed. We use a prometheus counter, which will allow us to measure the total time spent waiting but also the rate.
We use Ptime to measure the time spent in picoseconds.
Manually testing the MR
Setup a sandbox, and then blast it with simulations (here a random eth_estimateGas straight from the spec). To stress test it's enough to start a lot of even simple requests, so let's put the following in a bash script and use parallel to use as many cores as possible.
cat << EOF > tmp
while true; do
curl http://localhost:8545/ \
-X POST \
-H "Content-Type: application/json" \
--data '{"method":"eth_estimateGas","params":[{"from":"0x8D97689C9818892B700e27F316cc3E41e17fBeb9","to":"0xd3CdA913deB6f67967B99D67aCDFa1712C293601","value":"0x186a0"}],"id":1,"jsonrpc":"2.0"}'
done
EOF
make the script executable and start a few in parallel
chmod a+x tmp
seq 1 32 | parallel --progress ./tmp
and look at the metric(s) increasing
while true; do curl http://localhost:8545/metrics -s | grep time_waiting; done
while true; do curl http://localhost:8545/metrics -s | grep queue_size; done
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR