Possible Scheduling Race Condition
Created by: darksidelemm
Refer to log file here: https://slexy.org/view/s2ichAFQle
Problem:
- Observations getting missed.
Sequence of Events:
- Task is scheduled via get_jobs.
- get_jobs runs again.
- Due to amazing Australian internet, it takes a good 30 seconds to get the jobs.
- By feat of amazing timing and terrible luck, the observation which is about to run gets deleted (see: https://github.com/satnogs/satnogs-client/blob/master/satnogsclient/scheduler/tasks.py#L143)
- In the time taken to process the new jobs, the grace time for the observation which was about to run has passed, and it doesn't get run (and a 'missed job' error is shown in the log).
Proposed Solution:
- Don't delete spawn_observer jobs that are due to run within the next X minutes (X = 1?). This would mean that job deletions would only apply for jobs more than <get_jobs rate> + X minutes in the future.