`MEMPOOL` p2p message is fundamentally limited to returning results up to about ~112 txns.
This is the loop where the MEMPOOl
p2p message results are prepared for a peer: https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/blob/master/src/net_processing.cpp#L4805
This loop will stop iterating after about ~112 txns from MEMPOOL
on a default BCHN node.
I consider this a bug. What is the point of limiting MEMPOOL
p2p message results? The results returned are just a bunch of INVs. Each INV
is 32 bytes. They get batched together in chunks of 2000 INV
s per message (+29 bytes of msg overhead per batch).
So.. A "full" mempool of ~100k txns being given a MEMPOOL
message to respond to.. will only yield ~3.2MB of payload + overhead sent to the peer requesting the mempool.
What is the point of limiting this? It's trivially easy to get a node to send you 3.2 MB of data otherwise (just ask it for blocks!). So it cannot be a bandwidth conserving measure.
Thoughts:
The only reason I can see for limiting this is perhaps to save on CPU burn. How so?
The way the algorithm that returns the results is written: It implicitly builds a heap to return txns in topological (natural) order which is O(log N)
complexity. It then iterates over the heap N
times, incurring O(log N)
cost each time it iterates to pop the topmost heap item and return it (after which it must rebalance the heap). This is effectively a topological txn sort that costs O(N log N)
.
So -- for a 100k txn mempool a MEMPOOL
message requires on the order of 1.7M operations. That's.. peanuts? Even if it isn't we already maintain a topological index in CTxMempool!
So we can just rewrite this algorithm to burn no extra CPU than it needs to, leveraging the entry_id
index of the mempool, with O(N)
complexity to prepare the results.
Thus.. I posit -- shouldn't the MEMPOOL
message simply return all txn INVs available, always?
More thoughts:
Why hasn't this been a problem up until now? Simple: nodes don't use the MEMPOOL
message and only SPV wallets do, and they do so with a bloom filter to only match txns of interest, so we haven't seen many SPV wallets with >112 txns in mempool. And even they would only have a problem when first coming online and not if they have been around for a while.
Proposal:
- Remove this limitation
- It arises from applying
nMaxBroadcats
to the preparation ofINV
s in response to theMEMPOOL
msg here: https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/blob/master/src/net_processing.cpp#L4805 - Just don't cost
MEMPOOL
reply tonMaxBroadcasts
- It arises from applying
- Fix the loop in preparing the reply to not use a heap, but just use the
entry_id
index, and only takeO(N)
to prepare results, notO(N log N)