Skip to content

Deadlock with AsyncContext

Hello,

I encountered a deadlock when using AsyncContext. The following minimal example reproduces the issue after a number of iterations. The code opens a Rep0 and a Req0 socket and ping pongs messages. I used async-nng latest HEAD which is 0.2.0.

#[test]
fn context() {
    const NUM_REQUESTS: usize = 1000000;
    const ADDR: &str = "tcp://127.0.0.1:4000";
    const MSG: &str = "hello";

    std::thread::spawn(move || {
        tokio::runtime::Builder::new_current_thread()
            .enable_all()
            .build()
            .unwrap()
            .block_on(async {
                let socket = nng::Socket::new(nng::Protocol::Rep0).unwrap();
                socket.listen(ADDR).unwrap();
                loop {
                    let mut context = async_nng::AsyncContext::try_from(&socket).unwrap();
                    println!("S: receiving");
                    let msg = context.receive(None).await.unwrap();
                    println!("S: sending");
                    context.send(msg, None).await.unwrap();
                }
            })
    });

    let socket = nng::Socket::new(nng::Protocol::Req0).unwrap();
    socket.dial_async(ADDR).unwrap();

    for n in 0..NUM_REQUESTS {
        println!("C: sending {n}");
        let mut message = nng::Message::with_capacity(MSG.len());
        message.extend(MSG.as_bytes());
        socket.send(message).unwrap();
        let resp = socket.recv().unwrap();
        println!("C: received {n}: {resp:?}");
        assert_eq!(resp.as_slice(), MSG.as_bytes());
        println!();
    }
}

Output:

...
C: sending 11247
S: sending
S: receiving
C: received 11247: Message { msgp: 0x147205570, header: Header { msgp: 0x147205570 } }

C: sending 11248
S: sending
S: receiving

I traced down the problem to:

  • The state of the used Aio is cleared here. This is before the callback is called and thus allows calls to send_ctx or recv_ctx.
  • If callback doesn't return (and probably some cleanup in nng happens afterwards) before threads are switched and the aio result is pulled from the channel (that is sent here) like this and a call to send_ctx or recv_ctx happens the code deadlocks.
  • I didn't check the nng internals but it seems that send/recv operations are not allowed before the aio callback completed. This maybe a problem in nng itself because the user cannot know when it's ok to do the next call after the callback finishes but I'm unsure.

I updated nng-sys to use nng v1.11 but this didn't resolve the issue.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information