Map step-runner errors to job failure reasons
Summary
When the step-runner panics and recovers via its gRPC interceptor, the resulting codes.Internal error was not being mapped to a specific failure reason. Additionally, the existing ClientStatusError handling used State instead of ErrorKind for classification, which didn't properly distinguish between different error types.
These issues caused failures to be misclassified, making it harder to distinguish runner infrastructure issues from user script errors in job metrics and debugging.
Changes
Error classification updates in common/build.go:
Maps gRPC errors from the step-runner to appropriate job failure reasons:
codes.Internal(gRPC handler panics) →ScriptFailureErrorInternal(step-runner internal errors) →ScriptFailureErrorStepFailure(standard step failures) →ScriptFailureErrorUnknown(unspecified errors) →UnknownFailureErrorCancelled→ leaves failure reason unset (handled elsewhere)
Note: Both
codes.InternalandErrorInternalare intentionally classified asScriptFailurerather thanRunnerSystemFailure. A malicious job could deliberately trigger either path to forge aRunnerSystemFailureand evade job-failure accounting.
New error type in steps/execute.go:
Introduces ClientInternalError to distinguish gRPC-level codes.Internal errors (e.g., panic recovery via step-runner's interceptor) from ClientStatusError which reports a Status returned by step-runner.
Resource cleanup fix:
Adds defer stdout.Close() to properly close the build logger stream in executeStepStage.
Step-runner upgrade (v0.32.x → v0.34.0)
This MR upgrades step-runner to v0.34.0, which includes the following changes:
v0.34.0
- Add structured error reporting to job status via
StatusErrorwithErrorKindenum, enabling clients to distinguish infrastructure failures from step failures without parsing error strings (!469 (closed))
v0.33.0
- Fix cloning steps when branch ref contains a forward slash (!481 (merged))
- Catch panics in step function execution and return them as errors (!438 (merged))
- Add Duo review rule to ensure MRs contain changelog entries (!448 (closed))
- Recover panics in gRPC handlers and return
codes.Internalto the caller instead of crashing the server goroutine (!457 (merged)) - Implement Graceful Process Termination for Exec Function (!463 (merged))
- Properly handle really long lines from subprocess output (!471 (closed))
- Isolate builtin functions from step results by introducing a
BuiltinContextinterface (!475 (closed)) - Add
docker/authbuiltin function for configuring Docker registry authentication (!484 (merged))
Screenshot
How errors appear when this MR is paired with Add panic recovery and structured error reporti... (step-runner!469 - merged).
| Before | After |
|---|---|
![]() |
![]() |
Reference
Relates to Recover panics from step-runner gRPC methods us... (step-runner#430 - closed)

