Remote-execution connection recovering (technical debt)
Background
Initial remote-execution implementation (!626 (merged)) does not handle gRPC connection failures during long-running Operation execution, nor does it tries to catch-up with Operation execution states from a potential reconnection.
The REAPI declares a WaitExecution() call that should help reopening an Operation stream given an Operation name.
Task description
Implementation should include:
-
Handle network failures while pooling on Operationstatus. -
Try to reconnect when such a failure happens. -
Resume Operationstatus polling if reconnection succeed.
Edited by Martin Blanchard