Runner registration token is not valid immediately after reset through API

Summary

If the instance runner registration token is reset using the API, the newly issued token is sometimes not valid immediately and runner registration will fail with a 403 forbidden status code.

My use case is to register runners automatically using Ansible.

The issue might be related to: #220832 (closed)

Steps to reproduce

  1. Reset the registration token using the API
  2. Immediately register a new runner using the generated token

I've written a small script with which I can consistently reproduce the issue.

#!/bin/bash

# GITLAB_USERNAME and GITLAB_PASSWORD have to be exported or set otherwise.

base_url="https://gitlab.mydomain.com"

function gitlab_login() {
  username=$1
  password=$2

  access_token=$(curl -s -G --data-urlencode "grant_type=password" \
    --data-urlencode "username=${username}" \
    --data-urlencode "password=${password}" \
    --request POST "${base_url}/oauth/token" | jq -r ".access_token")

  echo "$access_token"
}

function reset_registration_token() {
  access_token=$1

  registration_token=$(curl -s -H "Authorization: Bearer ${access_token}" --request POST "${base_url}/api/v4/runners/reset_registration_token" | jq -r ".token")

  echo "$registration_token"
}

function unregister_runner() {
  gitlab-runner unregister --all-runners
}

function register_runner() {
  registration_token=$1

  gitlab-runner register --non-interactive --url "${base_url}" --registration-token "${registration_token}"
}

access_token=$(gitlab_login "${GITLAB_USERNAME}" "${GITLAB_PASSWORD}")
echo "Access token: ${access_token}"

for i in {1..10}; do
  echo "Attempt ${i}"

  echo "Resetting runner registration token"
  registration_token=$(reset_registration_token "${access_token}")
  echo "New token: ${registration_token}"

  unregister_runner
  register_runner "${registration_token}"

  sleep 1
done;

What is the current bug behavior?

An instance runner registration token is not always immediately valid after reset.

What is the expected correct behavior?

An instance runner registration token is always immediately valid after reset.

Relevant logs and/or screenshots

Nginx log, which shows the reset and the failure to register with a 403 status code after. nginx-log

Possible workaround

A possible workaround is to wait before using the newly issued registration token. My Ansible role waits 120 seconds.

Edited by Steven Wobser