Skip to content

Ignore "not our ref" errors from gitlab-sshd error metrics

Stan Hu requested to merge sh-ignore-not-our-ref-errors into main

If a client requests a ref that cannot be found in the repository, previously gitlab-sshd would record it as part of its service level indicator metric. This is really an application error between the client and the Git repository, so we exclude it from our metrics.

Relates to https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15848

Testing

  1. With main branch, clone a repository via ssh.
  2. Then attempt to fetch an arbitrary SHA, such as: git fetch master 8b166a7b73827065da4423c72f2d2f6c36fb0701
  3. Notice that the metrics have been incremented:
% curl -s http://localhost:9122/metrics | grep error
# HELP gitlab_sli:shell_sshd_sessions:errors_total Number of SSH sessions that have failed
# TYPE gitlab_sli:shell_sshd_sessions:errors_total counter
gitlab_sli:shell_sshd_sessions:errors_total 1
  1. Now check out this branch, build, and restart ssh: gdk restart sshd
  2. Repeat step 2.
  3. Now observe that while the log message is present, the errors_total remains at 0.
Edited by Stan Hu

Merge request reports