Agent version on /-/clusters page not updating
Summary
After upgrading GitLab from 16.5.1 to 16.6.0 and the GitLab Agent for Kubernetes from 16.5.0 to 16.6.0, the agent's version as reported on the project's /-/clusters
page does not update to 16.6.0.
Steps to reproduce
- deploy a GitLab instance at 16.5.1 (I used the
registry.service.easy/easy/services/gitlab:16.5.1
Docker image) - register an agent using version 1.20.0 of the Helm chart (uses v16.5.0 of the agent, see upstream documentation for details)
- confirm that
/-/clusters
shows v16.5.0 for the agent - upgrade the GitLab instance to 16.6.0 (I used the
registry.service.easy/easy/services/gitlab:16.6.0
Docker image) - upgrade the agent to version 1.21.0 of the Helm chart (uses v16.6.0 of the agent)
- confirm the cluster runs a v16.6.0 image of the agent with
kubectl
- confirm that
/-/clusters
still shows v16.5.0
Example Project
I cannot revert GitLab.com back to 16.5.1
What is the current bug behavior?
The /-/clusters
page reports 16.5.0 as the agent's version whereas 16.6.0 is deployed.
What is the expected correct behavior?
The /-/clusters
page reports the deployed version, i.e. 16.6.0.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Current User: git Using RVM: no Ruby Version: 3.0.6p216 Gem Version: 3.4.21 Bundler Version:2.4.21 Rake Version: 13.0.6 Redis Version: 7.0.14 Sidekiq Version:6.5.12 Go Version: unknown GitLab information Version: 16.6.0 Revision: 6d558d71eba Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 13.11 URL: https://server.example.com/gitlab HTTP Clone URL: https://server.example.com/gitlab/some-group/some-project.git SSH Clone URL: git@server.example.com:some-group/some-project.git Using LDAP: yes Using Omniauth: no GitLab Shell Version: 14.30.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Gitaly - default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket - default Version: 16.6.0 - default Git Version: 2.42.0
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 14.30.0 ? ... OK (14.30.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes (cluster/worker) ... 1/1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... Server: ldapmain LDAP authentication... Success LDAP users with access to your GitLab server (only showing the first 100 results) User output sanitized. Found 100 users of 100 limit. Checking LDAP ... Finished Checking GitLab App ... Database config exists? ... yes Tables are truncated? ... skipped All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Cable config exists? ... yes Resque config exists? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Systemd unit files or init script exist? ... skipped (omnibus-gitlab has neither init script nor systemd units) Systemd unit files or init script up-to-date? ... skipped (omnibus-gitlab has neither init script nor systemd units) Projects have namespace: ... yes [REDACTED: removed 1663 lines matching "\d+/\d+ ... yes"] Redis version >= 6.0.0? ... yes Ruby version >= 3.0.6 ? ... yes (3.0.6) Git user has default SSH configuration? ... yes Active users: ... 1009 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Checking GitLab App ... Finished Checking GitLab subtasks ... Finished
Possible fixes
No idea on a possible fix, but I thought that maybe !135803 (merged) for #430046 (closed) could be related to this.
Until 16.5.0, the /-/clusters
page always updated the agent's version within a couple of minutes after I upgraded the agents. I have been doing these upgrades monthly-ish since at least version 15.8.0 of both GitLab and the agent. This is the first time the version has not updated (and it's been three days since already).
When I run a GraphQL request based on the sample request in the MR, I get
{
"data": {
"project": {
"clusterAgents": {
"edges": [
{
"node": {
"name": "production",
"connections": {
"edges": [
{
"node": {
"connectedAt": "2023-11-27T07:01:19+00:00",
"connectionId": "2947957171345121652",
"metadata": {
"podName": "production-gitlab-agent-v1-66c84bdcd5-cnr9k",
"version": "v16.6.0"
}
}
},
{
"node": {
"connectedAt": "2023-11-24T00:13:39+00:00",
"connectionId": "5189043737184731110",
"metadata": {
"podName": "production-gitlab-agent-v1-66d667b6fd-7bvv8",
"version": "v16.5.0"
}
}
}
]
}
}
},
{
"node": {
"name": "staging",
"connections": {
"edges": [
{
"node": {
"connectedAt": "2023-11-24T00:12:55+00:00",
"connectionId": "4354110824440983342",
"metadata": {
"podName": "staging-gitlab-agent-v1-557b957597-qxq6f",
"version": "v16.5.0"
}
}
},
{
"node": {
"connectedAt": "2023-11-24T04:25:52+00:00",
"connectionId": "7941384448129443311",
"metadata": {
"podName": "staging-gitlab-agent-v1-765b846d94-djpzh",
"version": "v16.6.0"
}
}
},
{
"node": {
"connectedAt": "2023-11-27T07:02:08+00:00",
"connectionId": "8027988099868929257",
"metadata": {
"podName": "staging-gitlab-agent-v1-765b846d94-gqbg8",
"version": "v16.6.0"
}
}
}
]
}
}
}
]
}
}
}
}
That is, there are two agent entries for my production
cluster, versions 16.5.0 and 16.6.0, and three for my staging
cluster, versions 16.5.0, 16.6.0 and 16.6.0. The second 16.6.0 entry for the staging
cluster appeared shortly after I forcibly deleted the staging-gitlab-agent-v1-765b846d94-djpzh
pod.
I was wondering whether the /-/clusters
page has been updated to handle the above kind of results or whether there was an oversight somewhere so that multiple entries started occurring (seeing that none of the versions before 16.5.0 are listed even though these have been deployed).