Fix X-Gitaly-Correlation-Id propagation to Gitaly gRPC calls
Context
Contributes to Eliminate Git-Related Infrastructure Failures i... (gitlab-org/quality/analytics/team#129 - closed).
Follow-up of Add the gitaly_correlation_id to most log entries (!203648 - merged) and Update Workhorse to log X-Gitaly-Correlation-Id... (!203087 - merged).
Runner sets X-Gitaly-Correlation-Id headers in Git requests to enable tracing of job operations from Runner through to Gitaly. However, workhorse was logging these correlation IDs but not passing them to Gitaly gRPC calls, breaking Git operation traceability.
From the Git traceability diagram, Runner configures Git to send correlation IDs that should flow through the entire stack for debugging purposes.
What's in this MR?
This MR enables end-to-end Git operation tracing by passing X-Gitaly-Correlation-Id from HTTP requests to Gitaly gRPC calls:
-
Extract correlation ID in git handlers: Modified
handleGetInfoRefsininfo-refs.goto extract theX-Gitaly-Correlation-Idheader and store it in the request context -
Manual gRPC metadata injection: Modified
withOutgoingMetadataingitaly.goto extract the correlation ID from context and add it directly to gRPC metadata sent to Gitaly
The approach bypasses the existing labkit correlation interceptor (which was generating its own correlation IDs) and manually controls which correlation ID gets sent to Gitaly.
Technical Details
Before this change:
- Workhorse logged:
"gitaly_correlation_id":"runner-provided-id" - Gitaly logged:
"correlation_id":"different-generated-id"
After this change:
- Workhorse logged:
"gitaly_correlation_id":"runner-provided-id" - Gitaly logged:
"correlation_id":"runner-provided-id"(same ID)
Steps to reproduce locally
- Start GDK with a public project
- Test correlation ID propagation:
CORRELATION_ID="test-$(date +%s)" curl -H "X-Gitaly-Correlation-Id: $CORRELATION_ID" \ "http://localhost:3000/your-project.git/info/refs?service=git-upload-pack" > /dev/null # Check workhorse logs timeout 3 gdk tail gitlab-workhorse | grep "$CORRELATION_ID" # Check gitaly logs timeout 3 gdk tail gitaly | grep "$CORRELATION_ID" - Both logs should show the same correlation ID
Below is an example on my local machine (screenshot, so that we can see the correlation ID in red):
