Log network response details on error

Problem

During an incident it was tricky to find the 429 logs that were the cause of runner update failures. We should have a way to know which runner logs correspond with which requests on the gitlab rails side.

@cmcfarland said:

idea that the runners should log the response codes it gets from talking to the API. Right now, we had to guess what response the API had returned that the runner manager had found to be an error.

gitlab-com/gl-infra/production#19438 (comment 2383859631)

In the runner logs we saw:

json.msg: Job trace termination "Success" failed

This log misses the path that was requested and the status code and the response body that was received back from GitLab. Ci developers are familiar with the status update corresponding to PUT /api/jobs/:id but other developers and operators might not be familiar with that.

This was exacerbated during the incident because filtering by json.method: PUT json.path:/api/jobs/:id on the rails side also did not return the responses since the method is logged inconsistently when the request fails on the rails side at the RackAttack layer. gitlab!183888 (merged)

Solution

When the runner fails because an endpoint returns a non 2xx response. Log:

path
status code
response body
correlation_id - in the response as X-Request-ID #38656 (comment 2391294283)

or any other details that will help associate the runners failure with the rails request type that caused the request to fail.

Edited Mar 18, 2025 by Allison Browne