Skip to content

Improve error propagation from gRPC client calls

Jürg Billeter requested to merge juerg/exceptions into master

Description

If an error occurs when a BuildGrid server calls a remote gRPC server (e.g. execution server fetching message from CAS server), either a RpcError (for UNAVAILABLE and ABORTED, after exhausting retries) or a ConnectionError is raised. There is no catch clause for either of these errors in service.py, which means that the client of the BuildGrid server will receive an INTERNAL error, sometimes even without any error details.

However, in many cases it would be preferable to propagate the gRPC status code from the remote server to the client to not miss error information in client logs and allow the client to handle selected status code differently (e.g. it may affect the retry decision).

Changes proposed in this merge request:

  • client: Move GrpcRetrier to a separate module
  • client/retrier.py: Map gRPC errors to suitable exceptions in GrpcRetrier
  • server (service.py): Map ConnectionError to UNAVAILABLE

Validation

A few exception checks in the test suite have been updated and behave as expected. No integration tests have been added for this.

Edited by Jürg Billeter

Merge request reports