Skip to content

Update trace force send interval to be dynamically adjusted based on update interval

Arran Walker requested to merge 325970-increased-default-force-update into main

What does this MR do?

Updates the trace force send interval to be dynamically set: update interval * 4, with a min/max bounding.

Why was this MR needed?

The regular update interval is controlled by GitLab, depending on whether anybody is watching a live trace or not. However, even when this value is increased, Runner will still perform a forced update every 30s.

This updates the force send value to be calculated dynamically based on the update interval. With the minimum value being 5 minutes, and maximum being 30 minutes.

This should result in fewer requests and also appears to fix a subtle bug, where behaviour of uploading log traces is strange if the force update interval is lower than the regular update interval (which it is most of the time).

What's the best way to test this MR?

There's not much to really test, but we can observe the new behaviour (note: this was originally done when the force send update was a fixed 5 minutes, but a 60s update interval will now be 4 minutes).

Using a job that prints date every 10 minutes:

sleep:
  script:
    - while true; do echo $(date); sleep 600; done
  • Live watching the job trace:

    The regular update interval is 3 seconds (as I'm watching the log), but because no data is being added to the trace and the job isn't changing status the forced send is performed every 5 minutes.

    {"code":202,"job":1458765282,"job-log":"0-1792","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1106-1791","status":"202 Accepted","time":"2021-07-28T13:00:00+01:00","update-interval":3000000000}
    {"code":200,"job":1458765282,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:05:01+01:00","update-interval":0}
    {"code":202,"job":1458765282,"job-log":"0-1821","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1792-1820","status":"202 Accepted","time":"2021-07-28T13:09:58+01:00","update-interval":3000000000}
    {"code":200,"job":1458765282,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:14:59+01:00","update-interval":0}
    {"code":202,"job":1458765282,"job-log":"0-1850","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1821-1849","status":"202 Accepted","time":"2021-07-28T13:19:59+01:00","update-interval":3000000000}
  • Not live watching job trace:

    When not watching the live job trace an update interval of 60s is used. Again, the regular update occurs when data is available (date printed every 10 minutes), but otherwise the force send update is used.

    {"code":202,"job":1458847791,"job-log":"0-1792","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"967-1791","status":"202 Accepted","time":"2021-07-28T13:24:37+01:00","update-interval":60000000000}
    {"code":200,"job":1458847791,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:29:37+01:00","update-interval":0}
    {"code":202,"job":1458847791,"job-log":"0-1821","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1792-1820","status":"202 Accepted","time":"2021-07-28T13:33:38+01:00","update-interval":60000000000}
    {"code":200,"job":1458847791,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:38:38+01:00","update-interval":0}
    {"code":202,"job":1458847791,"job-log":"0-1850","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1821-1849","status":"202 Accepted","time":"2021-07-28T13:43:38+01:00","update-interval":60000000000}
  • Not live watching job trace, older implementation:

    Without this MRs change, force updates occur more often because they're every 30s, whereas the regular update interval is set to 60s.

    This behaviour might also be revealing a subtle bug. Here we see that force updates are occuring every 60s, even though they're supposed to occur every 30s. That's because the force send check occurs at the regular update interval. Due to this check, it looks like sending an actual trace log might be done to complete luck. Here it's sent after 9 minutes, when it was supposed to only be delayed by 30s.

    The behaviour will work as expected as long as we ensure that the force update interval is longer than the regular update interval.

    {"code":202,"job":1458937436,"job-log":"0-1792","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1106-1791","status":"202 Accepted","time":"2021-07-28T13:52:09+01:00","update-interval":60000000000}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:53:10+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:54:10+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:55:10+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:56:10+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:57:11+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:58:11+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T13:59:11+01:00","update-interval":0}
    {"code":200,"job":1458937436,"job-status":"","level":"debug","msg":"Submitting job to coordinator... ok","runner":"tx7Rk1qD","time":"2021-07-28T14:00:12+01:00","update-interval":0}
    {"code":202,"job":1458937436,"job-log":"0-1821","job-status":"running","level":"debug","msg":"Appending trace to coordinator... ok","runner":"tx7Rk1qD","sent-log":"1792-1820","status":"202 Accepted","time":"2021-07-28T14:01:12+01:00","update-interval":60000000000}

What are the relevant issue numbers?

#27722 (closed)

Edited by Arran Walker

Merge request reports