Remote-execution connection recovering (technical debt)
Background
Initial remote-execution implementation (!626 (merged)) does not handle gRPC connection failures during long-running Operation
execution, nor does it tries to catch-up with Operation
execution states from a potential reconnection.
The REAPI declares a WaitExecution()
call that should help reopening an Operation
stream given an Operation
name.
Task description
Implementation should include:
-
Handle network failures while pooling on Operation
status. -
Try to reconnect when such a failure happens. -
Resume Operation
status polling if reconnection succeed.
Edited by Martin Blanchard