Skip to content

Ensure prometheus counter has time to increment

It seems without a small delay, the:

metrics.SliSshdSessionsErrorsTotal.Inc() call in sshd.trackError() hasn't properly registered.

I used the following test rig code:

#/bin/bash

counter=0

while true
do
  counter=$((counter+1))
  output=$(go test -v -timeout 5s -count 1 -run "^(TestSessionsMetrics)$" gitlab.com/gitlab-org/gitlab-shell/v14/internal/sshd)

  if [[ $? != 0 ]]; then
    printf "\ncounter=${counter}\n\n"
    printf "${output}"
    break
  else
    printf "\r${counter}"
  fi
done

to run the tests repeatedly to initially show the error:

$ ./test.sh
1
counter=2

=== RUN   TestSessionsMetrics
time="2023-06-30T19:04:51+10:00" level=info msg="connection: handle: new channel requested" channel_type=session correlation_id= remote_addr=
    connection_test.go:216:
        	Error Trace:	/Users/ash/src/gitlab/gitlab-shell/internal/sshd/connection_test.go:216
        	Error:      	Max difference between 1 and 0 allowed is 0.1, but difference was 1
        	Test:       	TestSessionsMetrics
--- FAIL: TestSessionsMetrics (0.00s)
FAIL
FAIL	gitlab.com/gitlab-org/gitlab-shell/v14/internal/sshd	0.345s
FAIL%

It can sometimes happen on the second run consistently, or random 😞

With the fix applied, I was able to repeat the test 100+ times without failure.

Closes #657 (closed)

Edited by Ash McKenzie

Merge request reports