Ensure prometheus counter has time to increment
It seems without a small delay, the:
metrics.SliSshdSessionsErrorsTotal.Inc()
call in sshd.trackError()
hasn't properly registered.
I used the following test rig code:
#/bin/bash
counter=0
while true
do
counter=$((counter+1))
output=$(go test -v -timeout 5s -count 1 -run "^(TestSessionsMetrics)$" gitlab.com/gitlab-org/gitlab-shell/v14/internal/sshd)
if [[ $? != 0 ]]; then
printf "\ncounter=${counter}\n\n"
printf "${output}"
break
else
printf "\r${counter}"
fi
done
to run the tests repeatedly to initially show the error:
$ ./test.sh
1
counter=2
=== RUN TestSessionsMetrics
time="2023-06-30T19:04:51+10:00" level=info msg="connection: handle: new channel requested" channel_type=session correlation_id= remote_addr=
connection_test.go:216:
Error Trace: /Users/ash/src/gitlab/gitlab-shell/internal/sshd/connection_test.go:216
Error: Max difference between 1 and 0 allowed is 0.1, but difference was 1
Test: TestSessionsMetrics
--- FAIL: TestSessionsMetrics (0.00s)
FAIL
FAIL gitlab.com/gitlab-org/gitlab-shell/v14/internal/sshd 0.345s
FAIL%
It can sometimes happen on the second run consistently, or random
With the fix applied, I was able to repeat the test 100+ times without failure.
Closes #657 (closed)
Edited by Ash McKenzie