`MEMPOOL` p2p message is fundamentally limited to returning results up to about ~112 txns.

This is the loop where the MEMPOOl p2p message results are prepared for a peer: https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/blob/master/src/net_processing.cpp#L4805

This loop will stop iterating after about ~112 txns from MEMPOOL on a default BCHN node.

I consider this a bug. What is the point of limiting MEMPOOL p2p message results? The results returned are just a bunch of INVs. Each INV is 32 bytes. They get batched together in chunks of 2000 INVs per message (+29 bytes of msg overhead per batch).

So.. A "full" mempool of ~100k txns being given a MEMPOOL message to respond to.. will only yield ~3.2MB of payload + overhead sent to the peer requesting the mempool.

What is the point of limiting this? It's trivially easy to get a node to send you 3.2 MB of data otherwise (just ask it for blocks!). So it cannot be a bandwidth conserving measure.

Thoughts:

The only reason I can see for limiting this is perhaps to save on CPU burn. How so?

The way the algorithm that returns the results is written: It implicitly builds a heap to return txns in topological (natural) order which is O(log N) complexity. It then iterates over the heap N times, incurring O(log N) cost each time it iterates to pop the topmost heap item and return it (after which it must rebalance the heap). This is effectively a topological txn sort that costs O(N log N).

So -- for a 100k txn mempool a MEMPOOL message requires on the order of 1.7M operations. That's.. peanuts? Even if it isn't we already maintain a topological index in CTxMempool!

So we can just rewrite this algorithm to burn no extra CPU than it needs to, leveraging the entry_id index of the mempool, with O(N) complexity to prepare the results.

Thus.. I posit -- shouldn't the MEMPOOL message simply return all txn INVs available, always?

More thoughts:

Why hasn't this been a problem up until now? Simple: nodes don't use the MEMPOOL message and only SPV wallets do, and they do so with a bloom filter to only match txns of interest, so we haven't seen many SPV wallets with >112 txns in mempool. And even they would only have a problem when first coming online and not if they have been around for a while.

Proposal:

Remove this limitation
- It arises from applying nMaxBroadcats to the preparation of INVs in response to the MEMPOOL msg here: https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/blob/master/src/net_processing.cpp#L4805
- Just don't cost MEMPOOL reply to nMaxBroadcasts
Fix the loop in preparing the reply to not use a heap, but just use the entry_id index, and only take O(N) to prepare results, not O(N log N)

Edited Apr 21, 2024 by Calin Culianu