The last_used_at column is cached in Redis via cached_attr_reader :last_used_at. So, when loading multiple tokens on the "list" API endpoint, the presenter will hit Redis once per token record. This is considered a N+1 problem.
Proposal
Add a limit of two active tokens per agent, while allowing any existing agents with >2 tokens to continue uninterrupted.
Only fetch active tokens (non-revoked).
Add last_used_at to the agent tokens API response.
2 of 3 checklist items completed
· Edited by
Pam Artiaga
(@tuxtimo we use this board to plan and have a high-level tracking of the ongoing work. The workflowrefinement lists the issues waiting to be refined in priority order.)
Out of the two proposal, I would prefer to start with:
Trying to use a combination of the BatchLoader and Redis mget, for instance to preload all Redis values to the presenter with just one Redis command.
That being said: How does the GraphQL API used in the frontend work around this problem?
Alternative proposal
A combination of the following:
An upper bound of active (not revoked) tokens at a time. There is no reason to allow more than two. Having more active tokens than that is a security liability, and probably means the user is misunderstanding the purpose of multiple tokens.
When reading last_used_at, only read from redis if the token is not revoked. Otherwise, use the database.
When revoking a token, make sure to persist the value of last_used_at. That way, reading revoked tokens accurately shows the last time the token successfully authenticated an agent.
How does the GraphQL API used in the frontend work around this problem?
It doesn't This will also be an N+1.
An upper bound of active (not revoked) tokens at a time. There is no reason to allow more than two
This is a great point, and we should do this anyway. Even a generous limit of say, 10, would mean that even with an N+1 to Redis the load wouldn't be terrible.
IMO serving the potentially out of date database value isn't a problem, as long as we document the limitation.
IMO serving the potentially out of date database value isn't a problem, as long as we document the limitation.
The main risk I see is that we might leave it that way for a long time and end up responding to a lot of inquiries from confused users (especially internally on Slack). Such interruptions could, over time, end up costing us more than a solution that serves the up-to-date value, but in a hard-to-measure way.
Since we're ok with restricting the number of active tokens, we should be able to go with the alternative proposal in full:
Add a limit of two active tokens
Only use the cache for active tokens
Flush the cached value to the database when a token is revoked
And then, finally:
Add the field to the API response
This will still technically be an N+1 (to Redis), just with a low value for N.
There are also some agents that already have more than two active tokens (the most on GitLab.com currently is 10). This is ok - these tokens will continue to work, they will just need to revoke some/all before registering a new token.
I think we have enough info for this to be picked up now. I'll update the description and move to workflowready for development. @hfyngvason feel free to update or move back to refinement if you have any concerns
Removing the revoked tokens from the response can only be done in %16.0. But once it's removed, we won't need to flush the last_used_at anymore. So this feature becomes just about only filtering the results for active tokens, and limiting them to two.
That said, we have 2 options:
Implement the first version before %16.0 with revoked tokens flushed column.
Or, implement the simpler approach once revoked tokens are no longer in the response.
If there isn't urgent need for the last_used_at column to be present, I think 2 makes more sense.
João Alexandre Cunhachanged title from Consider a way for how to returnlast_used_atwhen listing agent tokens in REST API without N+1 problems to Addlast_used_atto the agent token API mitigating Redis N+1 issues
changed title from Consider a way for how to returnlast_used_atwhen listing agent tokens in REST API without N+1 problems to Addlast_used_atto the agent token API mitigating Redis N+1 issues
Noting that the last_used_at field is actually already returned in both the REST and GraphQL API when fetching the tokens of an agent (see comment below). So what we need to do is:
Add a limit of two active tokens per agent during creation, while allowing any existing agents with >2 tokens to continue uninterrupted.
The change for this is added behind a Feature Flag. It needs to be enabled in production through ChatOps, then enabled by default a few days later if users do not complain.