Workhorse not reporting HTTP 500 errors correctly (via prometheus)
In trying to debug another related gitlab issue (spurious HTTP 500 failure codes in the API), I went down the path of trying to setup some observability metrics for Gitlab (Omnibus) using Prometheus and Grafana.
During the course of this I found that there are discrepancies between the errors logged in the Gitlab Workhorse logs, as opposed to the ones reported by Prometheus.
Here's an example:
Logs from workhorse show there are some 500 errors in the HTTP PUT request:
2018-10-05_02:12:49.32077 gitlab.example.org @ - - [2018/10/05:02:12:49 +0000] "PUT /api/v4/projects/721/issues/16/notes/30048 HTTP/1.1" 500 39 "" "python-requests/2.18.4" 0.113
2018-10-05_02:12:49.48343 gitlab.example.org @ - - [2018/10/05:02:12:49 +0000] "PUT /api/v4/projects/721/issues/16 HTTP/1.1" 500 39 "" "python-requests/2.18.4" 0.139
2018-10-05_02:48:32.15574 gitlab.example.org @ - - [2018/10/05:02:48:32 +0000] "PUT /api/v4/projects/719/issues/7/notes/30057 HTTP/1.1" 500 39 "" "python-requests/2.18.4" 0.292
The corresponding prom queries don't report any:
Here's one requested by @bjk-gitlab -
Note that it does not report any 500
errors for PUT
requests at all. The vector is entirely missing, contradicting what the logs show us.
Version Info:
GitLab 10.7.0
GitLab Shell 7.1.2
GitLab Workhorse v4.1.0
GitLab API v4
Ruby 2.3.6p384
Rails 4.2.10
postgresql 9.6.8
Edited by Anhad Jai Singh