Skip to content

Draft: Fix EoE in devel branch

Bjarne von Horn requested to merge igh-vh/ethercat:fix_master_eoe into devel

calling the send callback screwed up the datagram send list.

The debian kernel comes with CONFIG_DEBUG_LIST, so it complained:

[  253.917596] list_del corruption. prev->next should be ffff800008903dc8, but was ffff800008903e20. (prev=ffff800008903dc8)
[  253.917645] ------------[ cut here ]------------
[  253.917648] kernel BUG at lib/list_debug.c:59!
[  253.917654] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT_RT SMP
[  253.917662] Modules linked in: ec_generic(OE) ec_master(OE) hci_uart btqca btrtl btbcm btintel bluetooth jitterentropy_rng binfmt_misc sha512_generic sha512_arm64 evdev vc4 snd_soc_hdmi_codec bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) snd_soc_core videobuf2_vmalloc videobuf2_memops nls_ascii videobuf2_v4l2 snd_bcm2835(C) videobuf2_common snd_pcm_dmaengine brcmfmac snd_pcm nls_cp437 videodev aes_neon_bs snd_timer cec brcmutil snd rc_core vfat aes_neon_blk fat cpufreq_dt mc soundcore drm_display_helper drbg cfg80211 drm_dma_helper drm_kms_helper ansi_cprng raspberrypi_cpufreq ecdh_generic rfkill ecc bcm2835_thermal pwm_bcm2835 bcm2835_rng vchiq(C) bcm2835_wdt rng_core leds_gpio drm loop fuse dm_mod efi_pstore dax configfs ip_tables x_tables autofs4 ext4 mbcache jbd2 crc32c_generic smsc smsc95xx usbnet selftests mii libphy crc16 dwc2 udc_core roles usbcore sdhci_iproc crct10dif_ce crct10dif_common usb_common sdhci_pltfm sdhci i2c_bcm2835 bcm2835 phy_generic
Configuring PDOs...
Activating master...
[  253.917878] CPU: 0 PID: 664 Comm: EtherCAT-EoE Tainted: G         C OE      6.1.0-11-rt-arm64 #1  Debian 6.1.38-4
[  253.917887] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[  253.917891] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  253.917898] pc : __list_del_entry_valid+0x9c/0xec
[  253.917915] lr : __list_del_entry_valid+0x9c/0xec
[  253.917923] sp : ffff80000893bce0
[  253.917925] x29: ffff80000893bce0 x28: 00000000ffffd2fe x27: ffff7959c42a8b08
[  253.917935] x26: ffff7959c42a8000 x25: ffff7959c5e1047c x24: ffff800008903dc8
[  253.917945] x23: 0000000000000000 x22: ffff800008903db8 x21: ffff800008903db8
[  253.917954] x20: 0000000000000000 x19: ffff80000893bd78 x18: 0000000000000010
[  253.917963] x17: 20747562202c3863 x16: 6433303938303030 x15: 3038666666662065
[  253.917973] x14: 6220646c756f6873 x13: 2938636433303938 x12: 3030303038666666
[  253.917982] x11: 663d766572702820 x10: 2e30326533303938 x9 : ffffaaf709cd20e8
[  253.917991] x8 : ffff80000893ba18 x7 : 0000000000000000 x6 : 000000000000000c
[  253.918000] x5 : ffff7959f716cc90 x4 : 0000000000000002 x3 : 0000000000000000
[  253.918009] x2 : 0000000000000000 x1 : ffff7959c6c8a100 x0 : 000000000000006d
[  253.918018] Call trace:
[  253.918021]  __list_del_entry_valid+0x9c/0xec
[  253.918030]  ec_master_send_datagrams+0x1f0/0x3dc [ec_master]
[  253.918083]  ecrt_master_send+0x58/0x120 [ec_master]
[  253.918121]  ecrt_master_send_ext+0x9c/0xb0 [ec_master]
[  253.918158]  ec_master_internal_send_cb+0x18/0x24 [ec_master]
[  253.918195]  ec_master_eoe_thread+0x190/0x1cc [ec_master]
[  253.918233]  kthread+0x120/0x124
[  253.918241]  ret_from_fork+0x10/0x20
[  253.918254] Code: aa0003e1 f0004f60 91238000 941ab606 (d4210000) 
[  253.918259] ---[ end trace 0000000000000000 ]---

modifying lists in the kernel is neigher atomic nor thread safe, so they have to be guarded by locks.

Switching the EoE bridges to OP was disabled in be5dfb72, so this has to be done manually by the tool or by the application.

Btw, is the warning No sending response for eoe0s4 after 100 tries. normal when testing the throughput with iperf?

Also fixes #17.

Merge request reports