Occasionally getting 401 errors when trying to perform git commands

Summary

We are hosting a GitLab instance using the GitLab Chart project on a Kubernetes cluster (Version 1.20) in an air-gapped environment.

Recently, we upgraded the version from 14.10 all the way to 15.11 (using the suggested upgrade path). Since the very first day of the upgrade, users started getting 401 errors occasionally when trying to perform git commands (clone, push, pull etc). This happens only some of the time and is rather inconsistent.

Steps to reproduce

This one is a bit hard. We had another GitLab cluster acting as a stage environment. We had upgraded that one to version 15.11 is a similar manner run thousands of clone commands trying to reach the same error, but we were unable to.

Going deeper into this issue, we discovered the following:

What is the current bug behavior?

Diving deep into the logs, we found that the gitlab-shell containers would fail with Internal API error while trying to reach the webservice. In the webservice containers, we receive an exception message of Invalid iat.

What is the expected correct behavior?

Most of the time, gitlab-shell containers are able to access the webservices.

Relevant logs and/or screenshots

Some of the logs, since we are air-gapped

gitlab-shell example log:

{"component: "gitlab-shell", "level": "error", "message":"msg=\"Internal API error url=\"http://gitlab-webservice-default.gitlab.svc:8181/api/v4/internal/allowed\""}

webservice example log:

{"component": "gitlab", "subcomponent": "expections_json", "level":"error", "exception.class":"JWT::InvalidIatError", "exception.message":"Invalid iat"}
Edited by James Derune