It is difficult to troubleshoot errors in api logs because there is no error information in the log messages

500 error reporting for the api does not contain any error message or context which makes it very difficult to troubleshoot problems on gitlab.com when there are error spikes.

https://log.gitlab.net/goto/5c207effe4df8183827ecba6f58a9df7

This issue has come up in numerous incidents, most recently for gitlab-com/gl-infra/production#1294 (closed) where we saw a spike of errors on the api due to statement timeouts.

{"time":"2019-10-29T00:22:33.409Z","severity":"INFO","duration":15640.53,"db":15098.76,"view":541.7700000000004,"status":500,"method":"POST","path":"/api/v4/projects/278964/issues/30223/notes","params":[{"key":"body","value":"[FILTERED]"}],"host":"gitlab.com","remote_ip":"35.243.169.14, 10.216.1.34, 35.243.169.14","ua":null,"route":"/api/:version/projects/:id/issues/:noteable_id/notes","user_id":1786152,"username":"gitlab-bot","queue_duration":8.48,"correlation_id":"WQtOdxaKcz5","tag":"rails.api","environment":"gprd","hostname":"api-20-sv-gprd","fqdn":"api-20-sv-gprd.c.gitlab-production.internal","message":null}

We are not completely blind because it is possible to see the full error and stacktrace in sentry using the correlation ID. However, we should really have something in the error messages here as we first rely on logs for troubleshooting.

Edited Oct 31, 2019 by John Jarvis
Assignee Loading
Time tracking Loading