Skip to content

Return different error code when job is not running

Overview

For api/v4/jobs/:id/trace we send a 403 when the job is not running. We return 403 in multiple places, for example when the token is not valid and then the job is not running, it's quite to distinguish between one another. The reason for the 403 is explained in the response body, but this is not enough because inside of the logs we don't log the body of the request and it doesn't make sense too since we will end up having a high amount of cardinality inside of kibana. If you look at the runner logs closely you see Submitting job to coordinator... aborted but you have to know about this log and you need to look at the correct logs if you look at nginx/rails that are not visible. I think we can improve the debugability of this.

This was experienced in https://gitlab.com/gitlab-org/gitlab-ce/issues/63972 where it took me a while to see if the 403 was coming from an invalid token or from something else just because I wasn't familiar with how the flow works

Proposal

When a request is sent for a job that is already finished we should return a different error code for example 422 UNPROCESSABLE ENTITY. Having separate issues will help distingue one error from another, and users can set up monitoring/alerting if there is a high amount of 422/403, right now things like that are hard to predict.