Handling of Multiple Units in nmea2000
Handling of Multiple Units in nmea2000
There are some situations where gpsmon or any other client doesn't show gps data. I have attached two can log files to help to reproduce and explain the issue. I have tried to find out what the problem is and want to suggest a patch which (in my opinion) solves or at least mitigates the bug.
Example
- attach a CAN-Bus to two devices and ensure socketcan is working on both devices, the interface name has to be "can0"
a) working case
-
on one device
kill all running instances of gpsd
start gpsd nmea2000://can0 # goes to background gpsmon -n localhost:2947:nmea2000://can0
-
on the other device start
canplayer -I nmea_good.log
-
gpsmon shows the gps data as expected - no failure
b) failure case
-
on one device
kill all running instances of gpsd
start gpsd nmea2000://can0 # goes to background gpsmon -n localhost:2947:nmea2000://can0
-
on the other device start
canplayer -I nmea_bad.log
-
gpsmon shows nothing which is wrong, it should display the same information as in case a)
The two files nmea_good.log and nmea_bad.log differ only in one -- the first -- message. This message comes from another device (SA=0xF0) and occupies the default can device. The next message (SA=0xDC) creates a new device/unit. Gpsmon has to query SA 0xDC explicitely to get the wanted gps information.
c) workaround
-
on one device
kill all running instances of gpsd
start gpsd nmea2000://can0 # goes to background gpsmon -n localhost:2947:nmea2000://can0 #
gpsmon -n localhost:2947:nmea2000://can0:220 # 0xDC = 220 decimal -
on the other device start
canplayer -I nmea_bad.log
-
the second gpsmon instance shows the gps data as expected (no failure) but first instance display nothing.
Explanation
NMEA2000 uses the concept of source address (SA) defined in SAE J1939. This allows the transmission over Non-NMEA2000-Busses. ISOBUS in our case.
Each sender on such a CAN-Bus has its unique SA. The Gpsd driver for
NMES2000 creates a new device (gps_add_device) for each particular SA
found on the bus. The name of such a device is for example
"nmea2000://can0:27" (can0: first CAN-Bus, 27: source address)
and a user can refer to that device like
"gpsmon -n localhost:2947:nmea2000://can0:27".
"nmea2000://can0" without SA is the default device.
The problem is: the SA is volatile. It can be changed even at runtime so that a specific device can get different addresses at different times.
That means a gpsd client has no information what SA it wants to listen to. A client would have to catch and interpret all the address claim messages on the bus to get the necessary information by itself -- which is quite complicated.
Even more worse, a client program can't use the default device "nmea200://can0". This is because the default device is associated with the first message read from the bus after gpsd starts. If there is more then one device on the bus it is not predictable which SA becomes the default device.
And that's the problem. Driver_nmea2000.c creates a new device for each SA it gets. Regardless if the message has something to do with GPS/AIS or not. So, the default device is always associated with the first message got from socketcan.
A practical solution would be to simply ignore Non-GPS/AIS messages. A device sending unknown messages has no meaning for Gpsd and its clients. This solution is not complete. If more than one device on a bus sends GIS/AIS messages the situation is the same as it is now with an unpachted Gpsd. The client cannot predict from which device it gets the information. With the patch it is at least ensured that it gets GPS messages at all.
I have rearranged find_pgn() so that the part which parses the incomming message (line 1611... in gpsd-3.22) is executed before any other processing. That part returns from find_png() if the current message is not a valid GPS/AIS message. (Or an address claim, address claims come from all devices.)
I have made a second modification.
If I interpret the code correctly, lines 1605 - 1607 should be executed only for the default device. When driver_nmea2000 creates a new devices by itself, it makes the correct assignment "nmea2000_units[can_net][source_unit] = session;" during the gps_add_device() in line 1903. The driver assumes that the next call to find_pgn() for with ...unit_valid == false gets the same message esp. with the same SA. If that is the case, line 1605 - 1607 and lines 1597 - 1699 are somewhat redundant but introduce no inconsistencies. But in fact, there is no guarantee that the next call to find_pgn() is for the same SA. It can also contain a message with a different but new SA. Now, because of line 1607, two nmea2000_units[can_net][source_unit] can point to the same session.
I am not yet able do provoke a serious misbehaviour, but in my opinion the code would be more robust when line 1605 - 1607 are only executed when 1697 - 1699 are not executed right before. To get this behaviour, I introduced the variable "found".