Discovery - Resilient and crash-proof operation

Description

As per #358 (closed)

recovery & resuming of flight recording & processing (including video, but for everything), whether the flight and/or Whitebox and/or the computer are stopped cleanly or not

A connection to an external device may get interrupted before files off of it are downloaded, or the Whitebox itself may shutdown before it finishes post-processing those files.

In case of Insta360, currently, these operations are done through the post_flight_operations task, which only takes place once and is not failure resistant or aware.


We need to ensure that Whitebox can always recover and resume operations after a crash:

  • data integrity verification
  • crash-awareness (e.g. keep track of when the system crashed - update a file with current date every 50ms?)
  • notifying plugins about the crash so they can recover/resume with their own mechanisms if/where needs be

It should also try to figure out the reason of the crash if possible (check kernel logs for panic and similar), and let the user know.

Additionally, we need to ensure that Whitebox accurately and proactively keeps track of what external files are missing and what operations are pending, and automatically resume them when device is reconnected, or Whitebox is turned on, respectively. It should also notify user of anything they might need to do, e.g. reconnect the camera that was used to record when its file is missing/broken. This should apply to Insta360 and other devices in the future.

Scope

  • Catalogue all cases that would need to be covered by this mechanism
    • Backfilling the missing videos from camera on connect?
    • Make sure video recording is in sync with flight session status (in flight -> recording, not in flight -> not recording)
  • WIP
  • Create tasks to implement the WIP step-by-step
Edited by Milos