Long response times to Time Sync requests can cause node time to oscillate wildly
In a simple 2 node mesh the adopter's time can be observed to vary widely. Consider the following log output from the adopter
--> Start: New Connection, nodeId = 1439108065
--> Start: New Connection, {
"nodeId": 366351453, // We are ESP32-WROOM dev kit
"subs": [
{
"nodeId": 1439108065 // Remote is a TTGO with 1.7" TFT display
}
]
}
Changed connections
clockOffset(): Total request processing time (t3-t0) 10259 // 10 millisecods or less is a reasonable expectation of response time
nodeTimeAdjustedCallback(): At 17.155631 seconds, Adjusted mesh time to 33.478649 (Offset = 16323015) // First correction is always going to be large
clockOffset(): Total request processing time (t3-t0) 102365 // <--- Sometimes the response can take ten times as long despite remote being largely idle
nodeTimeAdjustedCallback(): At 17.469031 seconds, Adjusted mesh time to 33.839521 (Offset = 47473) // When that happens the calculated offset seems to be excesive.
clockOffset(): Total request processing time (t3-t0) 8223 // ...and the next 'normal' sync puts it back
nodeTimeAdjustedCallback(): At 17.687865 seconds, Adjusted mesh time to 34.10820 (Offset = -47534) // suspiciously by almost the same amount
clockOffset(): Total request processing time (t3-t0) 8166 // a further 'normal' sync
nodeTimeAdjustedCallback(): At 17.896803 seconds, Adjusted mesh time to 34.219742 (Offset = -16) // actually syncs the clock
clockOffset(): Total request processing time (t3-t0) 10138 // 60 seconds later a 'normal' sync
nodeTimeAdjustedCallback(): At 77.906858 seconds, Adjusted mesh time to 94.228809 (Offset = -988) // results in an acceptable offset but 200 milliseconds later it tries again anyway??
clockOffset(): Total request processing time (t3-t0) 390404 // This time with an exceptionally long time delay
nodeTimeAdjustedCallback(): At 78.291530 seconds, Adjusted mesh time to 94.423240 (Offset = -190241) // which results in an overly large correction
clockOffset(): Total request processing time (t3-t0) 7683 // which the next 'normal' sync puts back
nodeTimeAdjustedCallback(): At 78.509309 seconds, Adjusted mesh time to 94.831314 (Offset = 190294)
clockOffset(): Total request processing time (t3-t0) 8213
nodeTimeAdjustedCallback(): At 78.717866 seconds, Adjusted mesh time to 95.39764 (Offset = -106)
clockOffset(): Total request processing time (t3-t0) 8109
nodeTimeAdjustedCallback(): At 138.728054 seconds, Adjusted mesh time to 155.49405 (Offset = -547) // and so it goes on.
clockOffset(): Total request processing time (t3-t0) 193161
nodeTimeAdjustedCallback(): At 138.913185 seconds, Adjusted mesh time to 155.145189 (Offset = -89347)
clockOffset(): Total request processing time (t3-t0) 200752
nodeTimeAdjustedCallback(): At 139.324395 seconds, Adjusted mesh time to 155.741572 (Offset = 185173)
clockOffset(): Total request processing time (t3-t0) 199104
nodeTimeAdjustedCallback(): At 139.733775 seconds, Adjusted mesh time to 156.150045 (Offset = -907)
clockOffset(): Total request processing time (t3-t0) 8662
nodeTimeAdjustedCallback(): At 199.745313 seconds, Adjusted mesh time to 216.64998 (Offset = -96585)
clockOffset(): Total request processing time (t3-t0) 7423
nodeTimeAdjustedCallback(): At 199.963112 seconds, Adjusted mesh time to 216.282688 (Offset = -109)
clockOffset(): Total request processing time (t3-t0) 10613
nodeTimeAdjustedCallback(): At 259.974281 seconds, Adjusted mesh time to 276.291836 (Offset = -2021)
clockOffset(): Total request processing time (t3-t0) 187237
nodeTimeAdjustedCallback(): At 260.156390 seconds, Adjusted mesh time to 276.384315 (Offset = -89630)
clockOffset(): Total request processing time (t3-t0) 200489
nodeTimeAdjustedCallback(): At 260.567126 seconds, Adjusted mesh time to 276.981192 (Offset = 186141)
clockOffset(): Total request processing time (t3-t0) 12978
nodeTimeAdjustedCallback(): At 260.790626 seconds, Adjusted mesh time to 277.108955 (Offset = -95737)
clockOffset(): Total request processing time (t3-t0) 178643
nodeTimeAdjustedCallback(): At 261.180333 seconds, Adjusted mesh time to 277.582942 (Offset = 84279)
clockOffset(): Total request processing time (t3-t0) 7629
nodeTimeAdjustedCallback(): At 261.398555 seconds, Adjusted mesh time to 277.716166 (Offset = -84997)
clockOffset(): Total request processing time (t3-t0) 186698
nodeTimeAdjustedCallback(): At 261.795361 seconds, Adjusted mesh time to 278.201896 (Offset = 88924)
clockOffset(): Total request processing time (t3-t0) 7643
nodeTimeAdjustedCallback(): At 262.13258 seconds, Adjusted mesh time to 278.330929 (Offset = -88864)
clockOffset(): Total request processing time (t3-t0) 7664
nodeTimeAdjustedCallback(): At 262.231336 seconds, Adjusted mesh time to 278.549000 (Offset = -7)
clockOffset(): Total request processing time (t3-t0) 12840
nodeTimeAdjustedCallback(): At 322.244525 seconds, Adjusted mesh time to 338.562293 (Offset = 104)
clockOffset(): Total request processing time (t3-t0) 175726
nodeTimeAdjustedCallback(): At 322.413922 seconds, Adjusted mesh time to 338.648339 (Offset = -83352)
clockOffset(): Total request processing time (t3-t0) 200817
nodeTimeAdjustedCallback(): At 322.825502 seconds, Adjusted mesh time to 339.238158 (Offset = 178239)
clockOffset(): Total request processing time (t3-t0) 199147
nodeTimeAdjustedCallback(): At 323.234898 seconds, Adjusted mesh time to 339.646547 (Offset = -1006)
clockOffset(): Total request processing time (t3-t0) 8136
nodeTimeAdjustedCallback(): At 383.245492 seconds, Adjusted mesh time to 399.561068 (Offset = -96073)
clockOffset(): Total request processing time (t3-t0) 8530
nodeTimeAdjustedCallback(): At 383.464207 seconds, Adjusted mesh time to 399.778479 (Offset = -1304)
The local node time is oscillating +/- 100ms for no valid reason and is not meeting the expectation of +/- 5ms. Ignoring for the moment the question of why so many requests take so long, it is clearly possible that delays to the response can occur in live systems. When that happens the ratio between remote processing time (t2-t1) and trip time (t3-t0)-(t2-t1) has a detrimental effect on the calculation of clock offset (See ntp algorithm theory) This is particularly true when t0 to t1 and t2 to t3 are not symmetric.
A possible solution to this problem would be to increase the discipline logic with the clockOffset() function and discard outliers with a higher than acceptable cpu/trip ratio. Unfortunately clockOffset() doesn't have a neat way of implementing a discard decision. One way might be to force the offset to zero and request a re-sync.
Possible change
inline int32_t clockOffset(uint32_t time0, uint32_t time1, uint32_t time2,
uint32_t time3) {
uint32_t offset =
((int32_t)(time1 - time0) / 2) + ((int32_t)(time2 - time3) / 2);
// Take small steps to avoid over correction
if (offset < 0.5 * TASK_SECOND && offset > 4) offset = offset / 4;
// Discard outliers if total request time is 100 times more than the remote processing time
if ((time3-time0) > (time2-time1)*100) offset = 0;
return offset;
}
In order to request a new attempt handleTimeSync() also needs to check for zero
if ((offset < TIME_SYNC_ACCURACY) && (offset > -TIME_SYNC_ACCURACY) && (offset !=0)) {
// mark complete only if offset was less than 10 ms
conn->timeSyncTask.delay(TIME_SYNC_INTERVAL);
It is not an elegant solution and better solutions probably exist. The best that can perhaps be said for it is that it does no harm as a offset calculation of zero is always a candidate for review.
It is however quite effective in making timeSync() more robust;
setLogLevel: ERROR |
--> Start: New Connection, nodeId = 3006701716
--> Start: New Connection, {
"nodeId": 366351453,
"subs": [
{
"nodeId": 3006701716
}
]
}
At 28.869156 seconds, Adjusted mesh time to 4283.529939 (Offset = -40306517)
At 29.182061 seconds, Adjusted mesh time to 4283.842841 (Offset = 0)
At 89.149542 seconds, Adjusted mesh time to 48.841491 (Offset = -1535)
At 149.157130 seconds, Adjusted mesh time to 108.847695 (Offset = -1386)
At 149.397686 seconds, Adjusted mesh time to 109.88249 (Offset = 0)
At 209.170983 seconds, Adjusted mesh time to 168.860102 (Offset = -1445)
At 269.178982 seconds, Adjusted mesh time to 228.866793 (Offset = -1309)
At 269.409921 seconds, Adjusted mesh time to 229.97731 (Offset = 0)
At 329.193346 seconds, Adjusted mesh time to 288.879509 (Offset = -1646)
At 389.201034 seconds, Adjusted mesh time to 348.885925 (Offset = -1275)
At 389.422462 seconds, Adjusted mesh time to 349.107350 (Offset = 0)
Local node time is generally kept within 2ms of remote