Skip to content

pibridge-serdev: Fix premature timeout while waiting for event

Created by: linosanfilippo-kunbus

Using a MIO module with a connect produces lots of error messages like the following in the log file.

piControl: recv len from pibridge err(got:0, exp:20) piControl: talk with mio for aio data error(addr:31, ret:-70) piControl: crc for dio data err(got:0, expect:103) piControl: recv len from pibridge err(got:0, exp:20) piControl: talk with mio for aio data error(addr:31, ret:-70) piControl: recv len from pibridge err(got:0, exp:20) piControl: talk with mio for dio data error(addr:31, ret:-70)

The reason for these messages are repeated incomplete datagram receptions for sent data requests. Turning the tracing on in the pibridge module showed that the receiving process does not wait long enough to receive all data:

piControl I/O-1579 [003] ....... 81000.509144: pibridge_send_begin: datalen=12 data:{0x1f,0x9,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x16} piControl I/O-1579 [003] ....... 81000.510308: pibridge_send_end: datalen=12 piControl I/O-1579 [003] ....... 81000.510309: pibridge_receive_begin: datalen=20 piControl I/O-1579 [003] ....... 81000.512038: pibridge_receive_timeout: len=16 timeout=10 piControl I/O-1579 [003] ....... 81000.512038: pibridge_receive_end: datalen=16 data:{0x9f,0x11,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0} kworker/u8:0-507 [003] ....... 81000.512059: pibridge_wakeup_receive_buffer: Wakeup 0

In the case shown above the process starts waiting for the response packet at 81000.510309 but at 81000.512038 (i.e. only 1,73 ms later) the wait already ends with a timeout. This is odd since the implementation uses wait_event_timeout() with a 10 ms timeout.

However the timeout is expected as jiffies by this function and for a kernel that uses 100 jiffies per HZ 10 ms are converted into only 1 jiffy.

So the timeout happens after the next timer tick, which may be significantly shorter than the full jiffy period (10 ms) in case that the waiting starts in the middle of a jiffy period.

Fix this issue by waiting at least one additional jiffy to make sure that the timeout does not happen before at least one full jiffy period of 10 ms.

(See also this thread in which a similar issue for the use of wait_event_timeout() with a 1 jiffy timeout has been reported: https://lore.kernel.org/lkml/529CA42A.3040504@freebox.fr/)

Signed-off-by: Lino Sanfilippo l.sanfilippo@kunbus.com

Merge request reports