[Performance] g_mempool.removeforblock(...) in block validation post-process stage is slow
Validating a 32 MB, 100k-tx block currently takes about 3.5-4.5 seconds on a fast desktop CPU. That's pretty slow, and it makes further scaling very difficult -- if we have a 20 second budget for propagation to stay below a 3.3% orphan rate, then with 128 MB blocks, then if validation precedes forwarding even for a single propagation hop, that would be enough to use up the majority of our budget. In order to stay within our time budget, we would need to either have near-100% certainty that validation would always follow propagation, or we need to make block validation much faster. The current code is not able to reliably forward blocks before validation, since Compact Blocks' getblocktxn
commands require LOCK(cs_main);
, and that is held by the block validation thread for several seconds; in order to get acceptable block propagation performance, we will need to either (a) make our locking system not suck, or (b) make block validation faster. Both of those tasks sound worthwhile, but (b) is definitely easier, so I suggest we start there.
I've tracked the performance bottleneck down to CTxMemPool::removeForBlock(...)
at txmempool.cpp:627. The purpose of this code is to go through each transaction in the block, and remove that transaction from mempool (if it exists), to remove any conflicting transactions from mempool, and update the minimum fee estimator. I haven't benchmarked this yet, but I suspect that most of the time is taken up by the call to CTxMemPool::removeConflicts(...)
at txmempool.cpp:653. That method iterates through every input/spend in a transaction, and removes any other transactions from mempool that depend on the same input. Seems sketchy, and I don't know why we would want to check for conflicts with transactions that were both in mempool and in the block.
The following is the debug.log
output for generatign a 26.8 MB, 86,178-tx block using 100% of the mempool, with some extra debug/benchmarking code of mine added (not committed in my repo).
2020-09-07T00:31:49.046831Z ThreadRPCServer method=generate
2020-09-07T00:31:52.040713Z CreateNewBlock(): total size: 26837944 txs: 86178 fees: 30731760 sigops 143620
2020-09-07T00:31:52.040745Z CheckBlockHeader took 0 usec
2020-09-07T00:31:52.040751Z BlockMerkleRoot took 6 usec
2020-09-07T00:31:52.052879Z Blocksize check took 12123 usec
2020-09-07T00:31:52.193369Z Transaction check took 140485 usec
2020-09-07T00:31:52.203501Z CheckBlockHeader took 1 usec
2020-09-07T00:31:52.203519Z BlockMerkleRoot took 26 usec
2020-09-07T00:31:52.216013Z Blocksize check took 12485 usec
2020-09-07T00:31:52.354209Z Transaction check took 138197 usec
2020-09-07T00:31:52.354237Z - Sanity checks: 150.74ms [0.16s (0.22ms/blk)]
2020-09-07T00:31:52.423227Z - Fork checks: 68.98ms [0.13s (0.18ms/blk)]
2020-09-07T00:31:52.760716Z - Connect 86179 transactions: 337.46ms (0.004ms/tx, 0.002ms/txin) [0.41s (0.57ms/blk)]
2020-09-07T00:31:52.762642Z - Verify 143520 txins: 339.42ms (0.002ms/txin) [0.42s (0.58ms/blk)]
2020-09-07T00:31:52.788621Z CreateNewBlock() packages: 2982.41ms (45503 packages, 901306 updated descendants), validity: 758.79ms (total 3741.20ms)
2020-09-07T00:31:52.810486Z CheckBlockHeader took 4 usec
2020-09-07T00:31:52.822786Z BlockMerkleRoot took 12302 usec
2020-09-07T00:31:52.835442Z Blocksize check took 12653 usec
2020-09-07T00:31:52.983053Z Transaction check took 147611 usec
2020-09-07T00:31:52.983227Z AcceptBlockHeader took 155 usec
2020-09-07T00:31:52.993581Z ContextualCheckBlock took 10337 usec
2020-09-07T00:31:52.993603Z Parking check took 34 usec
2020-09-07T00:31:53.004005Z NewPoWValidBlock took 10393 usec
2020-09-07T00:31:53.103644Z SaveBlockToDisk+etc took 99638 usec
2020-09-07T00:31:53.103813Z Flush+Check took 178 usec
2020-09-07T00:31:53.103819Z AcceptBlock total: 120741 usec
2020-09-07T00:31:53.103845Z - Load block from disk: 0.00ms [0.00s]
2020-09-07T00:31:53.103854Z - Sanity checks: 0.00ms [0.16s (0.22ms/blk)]
2020-09-07T00:31:53.183042Z - Fork checks: 79.17ms [0.21s (0.29ms/blk)]
2020-09-07T00:31:53.525213Z - Connect 86179 transactions: 342.17ms (0.004ms/tx, 0.002ms/txin) [0.75s (1.03ms/blk)]
2020-09-07T00:31:53.527966Z - Verify 143520 txins: 344.93ms (0.002ms/txin) [0.77s (1.05ms/blk)]
2020-09-07T00:31:53.610989Z - Index writing: 83.01ms [0.09s (0.12ms/blk)]
2020-09-07T00:31:53.611000Z - Callbacks: 0.03ms [0.00s (0.00ms/blk)]
2020-09-07T00:31:53.620842Z - Connect total: 516.99ms [0.61s (0.83ms/blk)]
2020-09-07T00:31:53.648238Z - Flush: 27.39ms [0.06s (0.08ms/blk)]
2020-09-07T00:31:53.648263Z - Writing chainstate: 0.04ms [0.00s (0.00ms/blk)]
2020-09-07T00:31:56.220684Z UpdateTip: new best=2ae8562124923b55f1edd1a1d1be270d3046fc2ac861f19f9720cda9b85d80cb height=365 version=0x20000000 log2_work=9.5156998 tx=86748 date='2020-09-07T00:31:52Z' progress=1.000000 cache=13.4MiB(102569txo)
2020-09-07T00:31:56.220705Z - Connect postprocess: 2572.44ms [2.58s (3.54ms/blk)]
2020-09-07T00:31:56.220718Z - removeForBlock: 2572.392000ms
2020-09-07T00:31:56.220722Z - disconnectpool: 0.002000ms
2020-09-07T00:31:56.220725Z - disconnectpool2: 0.001000ms
2020-09-07T00:31:56.220728Z - SetTip: 0.001000ms
2020-09-07T00:31:56.220732Z - UpdateTip: 0.045000ms
2020-09-07T00:31:56.220736Z - Connect block: 3116.87ms [3.25s (4.45ms/blk)]
Of note: it took 3116 ms total for ConnectBlock in this instance, of which 2572 ms (82.5%) was taken up by removeForBlock
. This is noteworthy in part because in this particular (common) case, these 2572 ms could have been completely avoided by simply doing g_mempool.clear(), since every tx in mempool was also included in the block.
This seems like it would be worth optimizing this a bit.
If anyone else wishes to work on this, I suggest using the p2p_stresstest.py
benchmark within my fast-x-bench
branch:
git remote add jtoomim git@gitlab.com:jtoomim/bitcoin-cash-node.git
git fetch jtoomim
git checkout fast-x-bench
cd build
ninja
cd ../test/
ln -s ../build/test/config.ini .
cd benchmark
sudo mkdir /tmpfs # optional, but it allows --walletdir to work with 3x faster spam generation
sudo mount -t tmpfs -o size=2G /tmpfs # optional
# delete the --walletdir=/tmpfs/test0 part if you aren't using the tmpfs
python3 p2p_stresstest.py --walletdir=/tmpfs/test0
You may want to interrupt the test after the propagation of the first block with a ctrl-c so that you can view the debug.log files in e.g. /tmp/test[something]/node0/regtest/debug.log
.