NTCP: multiple core issues
Created by: anonimal
By submitting this issue, I confirm the following:
- I have read and understood the contributor guide.
- I have checked that the issue I am reporting can be replicated or that the feature I am suggesting is not present.
- I have checked opened or recently closed pull requests for existing solutions/implementations to my issue/suggestion.
Place an X inside the bracket to confirm
-
I confirm.
Up until very recently, with full knowledge of the quality of this code-base and the many security issues/issues at-large (many of which have been resolved), I had never ran Kovri over clearnet - not once - since the first Kovri commit. That's over 6+ months of non-clearnet development! Instead, I had been using a Tor transproxy to do the myriad of other other critical/major/misc development that was needed - but did not require clearnet access.
Now that I am regularly developing Kovri over clearnet, core issues have been exposed of which I would have caught earlier had I been able to safely develop over clearnet (I had assumed some of these issues were transproxy-related and I also did not have proper debugging output until fe9654af). One of these core issues is an apparently very broken NTCP implementation.
An example, as seen in every NTCP session (Edit: i.e., every session that successfully connects and completes the 4-phase message sequence):
2016:05:29|06:44:40.300303 default:default NFO NTCPServer: connected to 184.58.98.218:24234
2016:05:29|06:44:40.300510 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase1, preparing
2016:05:29|06:44:40.300686 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- Phase1, sending
2016:05:29|06:44:40.300866 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** scheduling termination
2016:05:29|06:44:40.301042 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 --> Phase1 sent, receiving
2016:05:29|06:44:40.554746 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase2 received, processing
2016:05:29|06:44:40.554968 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** creating shared key
2016:05:29|06:44:40.556551 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase2 successful, proceeding to Phase3
2016:05:29|06:44:40.556774 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase3, preparing
2016:05:29|06:44:40.557208 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- Phase3, sending
2016:05:29|06:44:40.557440 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 --> Phase3 sent, receiving Phase4
2016:05:29|06:44:40.741753 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase4 received, processing 64 bytes
2016:05:29|06:44:40.742591 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase4, session connected
2016:05:29|06:44:40.742823 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** processing connected session
2016:05:29|06:44:40.743005 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending TimeSyncMessage
2016:05:29|06:44:40.743187 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP message
2016:05:29|06:44:40.743658 default:default DBG Transports: [oT4e] connecting
2016:05:29|06:44:40.743860 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 --> receiving payload
2016:05:29|06:44:40.744006 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- 16 bytes transferred, 16 total bytes sent
2016:05:29|06:44:40.744208 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP messages
2016:05:29|06:44:40.744437 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- 704 bytes transferred, 720 total bytes sent
2016:05:29|06:44:40.744616 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP messages
2016:05:29|06:44:40.744876 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 <-- 2672 bytes transferred, 3392 total bytes sent
2016:05:29|06:44:40.745062 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** scheduling termination
2016:05:29|06:44:40.745229 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** handling termination timer
2016:05:29|06:44:40.877248 default:default ERR NTCPSession: [oT4e] 184.58.98.218:24234 !!! HandleReceived(): 'End of file'
2016:05:29|06:44:40.877482 default:default NFO NTCPSession: [oT4e] 184.58.98.218:24234 !!! banning
2016:05:29|06:44:40.877663 default:default NFO NTCPServer: [oT4e] 184.58.98.218:24234 has been banned for 70 seconds
2016:05:29|06:44:40.877807 default:default DBG NTCPSession: [oT4e] 184.58.98.218:24234 *** terminating session
2016:05:29|06:44:40.878077 default:default DBG Transports: disconnecting peer
2016:05:29|06:44:40.878301 default:default ERR NTCPServer: ioservice error: 'remote_endpoint: Bad file descriptor'
2016:05:29|06:44:40.878404 default:default DBG NTCPServer: running ioservice
So, at first glance, the issues being:
1. Peer closes connection for currently unknown reasons Edit: fixed
2. We ban every NTCP peer because of said reason (or simply because they closed the connection). Why do they close the connection and why exactly are we banning? Once we find the reason for 1, we should review why we ban. Edit: fixed
Also,
3. When stopping the router:
Edit: fixed2016:05:29|07:08:59.634290 default:default DBG NTCPSession: [] 0.0.0.0:0 *** destroying session
At times this can appear a multiple of times, I've seen upwards of 20 times. Why are sessions being established to 0.0.0.0:0
and why so many times?
And lastly,
4. NTCPServer
appears to be incomplete, and also does not process any incoming TCP connections. This is a big. The whole point of P2P is to accept and relay traffic. Kovri is literally running on one leg by only using SSU. Edit: outbound NTCP is now functional. Also, our improper NTCP handling may watermark our router to observant peers or ISPs.
I need more time to review. Fixes may be simple and/or they may come along with a completed NTCP implementation. I'm labeling this issue as critical because of the potential security implications. Otherwise, it's simply a major bug.