Deadlock with AsyncContext
Hello,
I encountered a deadlock when using AsyncContext. The following minimal example reproduces the issue after a number of iterations. The code opens a Rep0 and a Req0 socket and ping pongs messages.
I used async-nng latest HEAD which is 0.2.0.
#[test]
fn context() {
const NUM_REQUESTS: usize = 1000000;
const ADDR: &str = "tcp://127.0.0.1:4000";
const MSG: &str = "hello";
std::thread::spawn(move || {
tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.unwrap()
.block_on(async {
let socket = nng::Socket::new(nng::Protocol::Rep0).unwrap();
socket.listen(ADDR).unwrap();
loop {
let mut context = async_nng::AsyncContext::try_from(&socket).unwrap();
println!("S: receiving");
let msg = context.receive(None).await.unwrap();
println!("S: sending");
context.send(msg, None).await.unwrap();
}
})
});
let socket = nng::Socket::new(nng::Protocol::Req0).unwrap();
socket.dial_async(ADDR).unwrap();
for n in 0..NUM_REQUESTS {
println!("C: sending {n}");
let mut message = nng::Message::with_capacity(MSG.len());
message.extend(MSG.as_bytes());
socket.send(message).unwrap();
let resp = socket.recv().unwrap();
println!("C: received {n}: {resp:?}");
assert_eq!(resp.as_slice(), MSG.as_bytes());
println!();
}
}
Output:
...
C: sending 11247
S: sending
S: receiving
C: received 11247: Message { msgp: 0x147205570, header: Header { msgp: 0x147205570 } }
C: sending 11248
S: sending
S: receiving
I traced down the problem to:
- The
stateof the usedAiois cleared here. This is before thecallbackis called and thus allows calls tosend_ctxorrecv_ctx. - If callback doesn't return (and probably some cleanup in
nnghappens afterwards) before threads are switched and the aio result is pulled from the channel (that is sent here) like this and a call tosend_ctxorrecv_ctxhappens the code deadlocks. - I didn't check the
nnginternals but it seems that send/recv operations are not allowed before the aio callback completed. This maybe a problem innngitself because the user cannot know when it's ok to do the next call after the callback finishes but I'm unsure.
I updated nng-sys to use nng v1.11 but this didn't resolve the issue.