Rescue gitaly errors across all API endpoints
What does this MR do and why?
Adds global error handling for Gitaly service unavailability across all API endpoints to return proper 503 Service Unavailable responses instead of 500 Internal Server Errors and 502 Bad Gateway. Issue: #570220
Problem
When Gitaly or Praefect services are unavailable or unreachable, API endpoints return:
- 500 Internal Server Error - Incorrectly categorizes infrastructure issues as application bugs
- 502 Bad Gateway - Generic error without context
This happens because Gitaly connection failures can manifest as:
- Direct GRPC errors (
GRPC::Unavailable,GRPC::DeadlineExceeded) - When the gRPC client can't connect or times out - Wrapped Git errors (Gitlab::Git::CommandError with GRPC::Unavailable as cause) - When Git operations fail due to Gitaly being down
Solution
Added two global rescue handlers in lib/api/api.rb:
-
rescue_from GRPC::Unavailable, GRPC::DeadlineExceeded- Catches direct gRPC errors (connection failures and timeouts) -
rescue_from Gitlab::Git::CommandError- Catches wrapped Gitaly errors (only when caused byGRPC::UnavailableorGRPC::DeadlineExceeded)
Both handlers now return:
- HTTP 503 Service Unavailable status
- Descriptive error message: "Gitaly service temporarily unavailable"
- Exception tracked in error monitoring
Benefits
- Consistent error responses - Same 503 response regardless of error path
- Global coverage - Applies to ALL API endpoints, not just pipelines
- Proper HTTP semantics - 503 indicates temporary service unavailability
- Better debugging - Clear error message identifies the root cause
- Reduced noise - Infrastructure issues no longer trigger application error alerts
- Complete coverage - Handles both connection failures AND timeouts
Example
Before: $ curl -i /api/v4/projects/123/pipelines/456/jobs HTTP/1.1 500 Internal Server Error {"message":"500 Internal Server Error"}
After: $ curl -i /api/v4/projects/123/pipelines/456/jobs HTTP/1.1 503 Service Unavailable {"message":"Gitaly service temporarily unavailable"}
How to test
- Stop Praefect/Gitaly: gdk stop praefect
- Make any API request that requires Git data:
curl -H "PRIVATE-TOKEN: "
http://localhost:3000/api/v4/projects//repository/commits - Verify response is 503 with message "Gitaly service temporarily unavailable"
- Restart Praefect: gdk start praefect
- Verify same request now returns 200 OK
Technical Details
Why two handlers?
Gitaly errors can reach the API layer via two different paths:
- Direct path: Gitaly Client → GRPC::Unavailable → API
- Wrapped path: Git Operation → GRPC::Unavailable → WrapsGitalyErrors → Gitlab::Git::CommandError → API
The second handler checks e.cause.is_a?(GRPC::Unavailable) to ensure we only return 503 for Gitaly connection issues, not other CommandError types (invalid refs, permissions, etc.).
MR acceptance checklist
-
Code changes -
Added global error handlers in lib/api/api.rb -
Handlers return 503 for Gitaly unavailability and timeouts -
Both GRPC::UnavailableandGRPC::DeadlineExceededhandled -
Errors are tracked in error monitoring
-
-
Tests -
Added test for GRPC::Unavailable -
Added test for GRPC::DeadlineExceeded -
All 86 tests in spec/requests/api/api_spec.rbpassing -
No regressions
-
-
Documentation (if needed) - N/A - Internal error handling change, no user-facing docs needed
-
Manual testing -
Tested with Praefect down → Returns 503 -
Tested with Praefect up → Returns 200
This MR changes the global API error handling to properly categorize Gitaly service unavailability as temporary infrastructure issues (503) rather than application errors (500).
-



