socket chardev loses data when remote end closes the connection
QEMU's socket chardev can lose data when the remote end closes the connection, if the guest UART or whatever is using the chardev has not yet read all the data from the remote before the chardev processes the HUP notification.
The specific case where I see this happening is in the tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd test, which writes "halt\r" to the guest's serial console, and then immediately closes its end of the connection. The test hangs intermittently because the guest doesn't always see the whole string, but sometimes only 'h' and no further data.
This is what seems to me to be going on:
- Avocado writes the data ('halt\r') and closes the socket pretty much immediately afterwards
- At the glib layer, the socket is polled, and it gets G_IO_IN and G_IO_HUP, indicating "readable, and also closed"
- glib's source dispatch mechanism first calls tcp_chr_read() to handle the G_IO_IN part
- tcp_chr_read() reads a single byte (the 'h'), because SocketChardev::max_size is 1 (which in turn is because the device model's can_write function returned 1 to say that's all it can accept for now). So there's still data to be read in future
- glib now calls tcp_chr_hup() because of the G_IO_HUP (as part of the same handle-all-the-sources loop)
- tcp_chr_hup() calls tcp_chr_disconnect(), which basically frees everything, tells the chardev backend that the connection just closed, etc
- the data remaining in the socket to be read after the 'h' is never read
Dan's view was:
Right, this is basically broken by (lack of) design right now.
The main problem here is that we're watching the socket twice. One set of callbacks added with io_add_watch_poll, and then a second callback added with qio_chanel_create_watch just for G_IO_HUP.
We need there to be only 1 callback, and when that callback gets G_IO_IN, it should ignore G_IO_HUP until tcp_chr_recv returns 0 to indicate EOF. This would cause tcp_chr_read to be invoked repeatedly with G_IO_IN | G_IO_HUP, as we read "halt\r" one byte at a time.