Skip to content

Agent version on /-/clusters page not updating

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

After upgrading GitLab from 16.5.1 to 16.6.0 and the GitLab Agent for Kubernetes from 16.5.0 to 16.6.0, the agent's version as reported on the project's /-/clusters page does not update to 16.6.0.

Steps to reproduce

  • deploy a GitLab instance at 16.5.1 (I used the registry.service.easy/easy/services/gitlab:16.5.1 Docker image)
  • register an agent using version 1.20.0 of the Helm chart (uses v16.5.0 of the agent, see upstream documentation for details)
  • confirm that /-/clusters shows v16.5.0 for the agent
  • upgrade the GitLab instance to 16.6.0 (I used the registry.service.easy/easy/services/gitlab:16.6.0 Docker image)
  • upgrade the agent to version 1.21.0 of the Helm chart (uses v16.6.0 of the agent)
  • confirm the cluster runs a v16.6.0 image of the agent with kubectl
  • confirm that /-/clusters still shows v16.5.0

Example Project

I cannot revert GitLab.com back to 16.5.1 😉 so no example, sorry.

What is the current bug behavior?

The /-/clusters page reports 16.5.0 as the agent's version whereas 16.6.0 is deployed.

What is the expected correct behavior?

The /-/clusters page reports the deployed version, i.e. 16.6.0.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info
System information
System:		
Current User:	git
Using RVM:	no
Ruby Version:	3.0.6p216
Gem Version:	3.4.21
Bundler Version:2.4.21
Rake Version:	13.0.6
Redis Version:	7.0.14
Sidekiq Version:6.5.12
Go Version:	unknown

GitLab information
Version:	16.6.0
Revision:	6d558d71eba
Directory:	/opt/gitlab/embedded/service/gitlab-rails
DB Adapter:	PostgreSQL
DB Version:	13.11
URL:		https://server.example.com/gitlab
HTTP Clone URL:	https://server.example.com/gitlab/some-group/some-project.git
SSH Clone URL:	git@server.example.com:some-group/some-project.git
Using LDAP:	yes
Using Omniauth:	no

GitLab Shell
Version:	14.30.0
Repository storages:
- default: 	unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path:		/opt/gitlab/embedded/service/gitlab-shell

Gitaly
- default Address: 	unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version: 	16.6.0
- default Git Version: 	2.42.0

Results of GitLab application Check

Expand for output related to the GitLab application check
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 14.30.0 ? ... OK (14.30.0)
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Internal API available: OK
Redis available via internal API: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes
Number of Sidekiq processes (cluster/worker) ... 1/1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... Server: ldapmain
LDAP authentication... Success
LDAP users with access to your GitLab server (only showing the first 100 results)
	User output sanitized. Found 100 users of 100 limit.
Checking LDAP ... Finished
Checking GitLab App ...
Database config exists? ... yes
Tables are truncated? ... skipped
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Cable config exists? ... yes
Resque config exists? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units)
Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units)
Projects have namespace: ... yes [REDACTED: removed 1663 lines matching "\d+/\d+ ... yes"]
Redis version >= 6.0.0? ... yes
Ruby version >= 3.0.6 ? ... yes (3.0.6)
Git user has default SSH configuration? ... yes
Active users: ... 1009
Is authorized keys file accessible? ... yes
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished

Possible fixes

No idea on a possible fix, but I thought that maybe !135803 (merged) for #430046 (closed) could be related to this.

Until 16.5.0, the /-/clusters page always updated the agent's version within a couple of minutes after I upgraded the agents. I have been doing these upgrades monthly-ish since at least version 15.8.0 of both GitLab and the agent. This is the first time the version has not updated (and it's been three days since already).

When I run a GraphQL request based on the sample request in the MR, I get

{
  "data": {
    "project": {
      "clusterAgents": {
        "edges": [
          {
            "node": {
              "name": "production",
              "connections": {
                "edges": [
                  {
                    "node": {
                      "connectedAt": "2023-11-27T07:01:19+00:00",
                      "connectionId": "2947957171345121652",
                      "metadata": {
                        "podName": "production-gitlab-agent-v1-66c84bdcd5-cnr9k",
                        "version": "v16.6.0"
                      }
                    }
                  },
                  {
                    "node": {
                      "connectedAt": "2023-11-24T00:13:39+00:00",
                      "connectionId": "5189043737184731110",
                      "metadata": {
                        "podName": "production-gitlab-agent-v1-66d667b6fd-7bvv8",
                        "version": "v16.5.0"
                      }
                    }
                  }
                ]
              }
            }
          },
          {
            "node": {
              "name": "staging",
              "connections": {
                "edges": [
                  {
                    "node": {
                      "connectedAt": "2023-11-24T00:12:55+00:00",
                      "connectionId": "4354110824440983342",
                      "metadata": {
                        "podName": "staging-gitlab-agent-v1-557b957597-qxq6f",
                        "version": "v16.5.0"
                      }
                    }
                  },
                  {
                    "node": {
                      "connectedAt": "2023-11-24T04:25:52+00:00",
                      "connectionId": "7941384448129443311",
                      "metadata": {
                        "podName": "staging-gitlab-agent-v1-765b846d94-djpzh",
                        "version": "v16.6.0"
                      }
                    }
                  },
                  {
                    "node": {
                      "connectedAt": "2023-11-27T07:02:08+00:00",
                      "connectionId": "8027988099868929257",
                      "metadata": {
                        "podName": "staging-gitlab-agent-v1-765b846d94-gqbg8",
                        "version": "v16.6.0"
                      }
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  }
}

That is, there are two agent entries for my production cluster, versions 16.5.0 and 16.6.0, and three for my staging cluster, versions 16.5.0, 16.6.0 and 16.6.0. The second 16.6.0 entry for the staging cluster appeared shortly after I forcibly deleted the staging-gitlab-agent-v1-765b846d94-djpzh pod.

I was wondering whether the /-/clusters page has been updated to handle the above kind of results or whether there was an oversight somewhere so that multiple entries started occurring (seeing that none of the versions before 16.5.0 are listed even though these have been deployed).

Edited by 🤖 GitLab Bot 🤖