Previously connected agent reported as "Never connected" when no active token was used to connect it

changed milestone to %Backlog

added bugux groupenvironments workflowrefinement labels

added typebug label

I think last connected time should probably be tracked on the agent object/DB record itself in addition to the token "last used" timestamp. This would resolve the issue.

There were users complaining about this weirdness recently (in some other issue).

@ash2k hi, does this bug reproduce in the UI or only DB/BE?

@rayana I think we became aware of it specifically via UI. Everything works normally, but whether the agent has connection or is connection is not reported correctly. I think I saw this happen even without a token revocation/deletion.

While it appears as a bugux the fix will most likely be on the backend including some database work to store a timestamp when an agent was last connected that is unrelated to the tokens - not sure yet. Maybe it's also about just considering revoked tokens, too in case they are still in the database.

@rayana I wouldn't expect that we need any assistance from UX at this time. Sorry, for the noise

No worries! Thanks for clarifying @ash2k @timofurrer

I'll remove the bugux label in this case because the problem does not originate from the planned user experience. I'll still leave the UX label applied because it does impact the UX, and it reproduces in the UI.

All bugux issues need to have a proper severity label set. Please add a severity label, remove the automation:ux-missing-labels label, and then reply to this comment briefly explaining your reasoning for providing this severity.

If you are not the DRI for this area and would like help determining the best severity, please @ the appropriate person for assistance.

This message was generated automatically. You're welcome to improve it.

added automation:ux-missing-labels label

mentioned in issue gitlab-org/quality/triage-reports#20243 (closed)

It is trivial to reproduce. On staging my agent is connected, I can access the cluster via it from my laptop, but the UI shows "Not connected".

Can this be an interaction between kas caching the agent info for a token for 5 minutes + long running connections + GitLab backend considering agent disconnected if there was no activity for a token for like several minutes or something? Actually, it shows that the last contact was 13 minutes ago.

While I was typing the above, it switched to "connected".

Perhaps the fix it not to store something in DB/Redis but get info for the connected agent pods by making a kas API call? Don't we do that already on this page?! How else do we know the versions?

This seems like legacy behavior - we didn't have kas API exposed to Rails, we only had info about token activity initially. That's why we relied on it to show if agent is connected or not.