Skip to content

Fix Runners heartbeat that can result in Runner being considered offline

Kamil Trzciński requested to merge fix-runner-hearbeat into master

What does this MR do?

This introduces two changes

Update ONLINE_CONTACT_TIMEOUT of Runner

This increases a timeout that Runner is considered online. This is due to fact of two aspects that impact how often we update DB entry:

  • Runner being terminated by Workhorse, thus waiting on queue notification

  • Rate of updating DB column

The timeout to consider Runner online (by DB) needs to be aligned with these two timeouts, otherwise runner can be wrongly assumed as not-online.

Any Runner originating request heartbeats Runner

Up to now Runner would be heartbeat if it would call jobs/request. However, in a case of a long running job the Runner might be considered offline, where in fact it is processing data.

We should heartbeat Runner on every communication:

  • requesting jobs
  • updating status / trace / artifacts: this is being introduced here

Does this MR meet the acceptance criteria?

Conformity

Links

Edited by 🤖 GitLab Bot 🤖

Merge request reports