@bonnie-tsang Can we get a design for this? It'll be a button in the project's allowlist settings to run the auto-populate script for the project. I'm not sure if we should put it by the allowlist or the new authentication log which is underneath the allowlist. WDYT?
Thanks @shampton@jocelynjane! I have a few questions and would like some clarification and input:
Are the auto-populate and revert actions only available either when:
The instance-wide setting is disabled; or
Before the allowlist enforcement?
Once the auto-populate button is clicked and the migration is successful, will the authentication log stop populating?
After the auto-populate button is clicked and the migration is successful, if someone then makes changes to the allowlist, what should the revert action look like?
Are the auto-populate and revert actions only available either when the instance-wide setting is disabled or before the allowlist enforcement?
@jocelynjane correct me if I'm wrong, but I think the auto-populate should be available regardless of settings until we remove it.
Once the auto-populate button is clicked and the migration is successful, will the authentication log stop populating?
No. The authentication log will continue populating.
After the auto-populate button is clicked and the migration is successful, if someone then makes changes to the allowlist, what should the revert action look like?
The revert action only removes items that were automatically added. We have a flag in the database that says whether the project/group was added as part of our script. So if someone makes changes to the allowlist and then reverts, the revert will still behave the same.
@bonnie-tsang@shampton I agree; we should let auto populate buttons live until we remove the feature. I can see where it might be appealing to keep this feature up and running longer than proposed, but realistically, we do need to drive the mindset that this list needs to be curated (better security).
re: longevity of the auto long - Longer term, we should pivot the authentication log as part of long-term goals to support customer auditing/compliance needs around CI job token authentications (i.e. Enhance CI job token authorization log to retai... (#496638)).
For the revert action, we also need some clear wording that this is a do at your own risk, as it does remove access - in other words, authentication will fail and break pipelines. This is most critical when we perform the auto population and then push the deprecation through. Any customer on .com who we did this for, will see breakage.
Hi team, working on the development planning on Job Token and after discussing with Scott and Jocelyn I've assigned @darbyfrey to this issue given the linked issue above. Does that seem feasible given the other aligned work?
@bonnie-tsang, I saw you are out of office next week but am wondering would it be possible to have the design complete by 1/3 EOD? This can help inform the frontend development Darby will pick up once back in office.
Please let me know if there are any questions or concerns prior to the holiday!
Thanks for the ping @mseferian-jenkins! I've worked on the design draft today and shared it in #509883 (closed)+, requesting feedback from the team.
Given the holiday season, some team members may be away, but I'm hoping to receive feedback early to allow me to finalise the design in time for the FE issue kickoff.
@mseferian-jenkins Just a quick update on where we're at - Over the last two weeks, I've worked on and shared the initial design #509883 (closed) with the team. I'm currently waiting for additional input to move things forward and followed up yesterday (see #498125 (comment 2278389475)) to confirm the direction for the feature.
Some team members may still be out due to the holiday season, but I'll continue following up to finalize the design as soon as possible
Hi @bonnie-tsang thank you for the update! You are right on, @shampton and @jocelynjane are OOO. I'll circle back mid-next week with them if the comment #498125 (comment 2278389475) hasn't been resolved. Thanks again!
The allowlist limit would need to factor in existing records. So we could adjust the compaction process to compact to a limit of 200 - N, where N is the number of existing entries.
Duplicate detection would require adding another step to the compaction process to remove already included projects or groups from the list to autopopulate from. Deduplication also raises the question of expected behavior if, say, a user-created allowlist entry was removed for access that would have been added by the auto population, resulting in broken access.
Another approach to consider would be to allow duplicates so that user created and autopopulated entries could exist together, but there might be duplicate access allowed to a project or group.
This is a fairly complex issue, so I would suggest creating an issue to discuss adding this additional scope and to work through the edge cases.
Tagging myself @bonnie-tsang to monitor this thread for the direction on whether we support the auto-populate option if there are already user-created entries, as this could potentially impact the design #509883 (closed).
@shampton@jocelynjane@darbyfrey Adding to the conversation in #498125 (comment 2269896417), I'd like to confirm the direction on whether to allow or disallow autopopulation when there are existing entries in the allowlist. Here's a summary of the options and considerations:
Allowing Autopopulation with Existing Entries:
Maximizes the value of autopopulation provided to users.
Intoduces complexities, such as duplicate detection and compaction adjustments outlined in #498125 (comment 2269896417).
Disallowing Autopopulation if Entries Exist:
Simplifies the implementation by reducing the scope of the feature.
Feature is unavailable for projects that already have entries in the allowlists.
I believe it’s not uncommon for users to have projects with existing entries in their allowlist. From a UX perspective, disallowing autopopulation in these cases might make the feature less impactful than intended and could misalign with user expectations. While authentication logs are visible for all projects, the autopopulation function only available for some projects but not others. Limiting autopopulation to projects with empty or no manually added entries introduces inconsistency and may confuse users.
That said, given the engineering complexities and trade-offs, if allowing autopopulation when entries exist isn't feasible right now, the design can be adjusted to reflect this direction. Once the direction is confirmed, I’ll refine #509883 (closed)+ with any required UX/UI adjustments. Thanks team!!
@darbyfrey@shampton@bonnie-tsang - whether or not we should autopopulate the allowlist should be based on whether or not the enforcement toggle is on and not the number of items (if any) in the allowlist. This is because a project could have items in the allowlist (e.g. someone started to migrate, but stopped) but if it's not being enforced, the items in that allowlist could be incomplete. The purpose of this migration solution is to ensure when we turn enforcement on, customers aren't broken. By ignoring the population where enforcement is off and the allowlist has at least one entry is a big miss and does not meet the spirit of "shipping migrations, not breaking changes."
If validating entries on the existing allowlist in the event that enforcement is off AND at least one entry exists in the allowlist will take another milestone to implement, we should do that because it's the right thing to do. We can discuss alternatives such as removing all existing entries in the allowlist if enforcement is off, then autopopulating. But the reason we are tracking which entries are automated v. manually created was to ensure we introduced the least intrusive solution for migration and avoid modification of customer data.
Thanks for this clarification @jocelynjane! This direction makes sense and aligns well with "shipping migrations, not breaking changes."
Given this direction, the last part to clarify is how we want to treat existing user generated entries in the allowlist auto-population process. In the case where enforcement is off and there are one or more user created entries in the allowlist, we could:
Factor in the existing entries in the compaction process so that the resulting allowlist will use the existing entries as part of the migration.
Ignore the existing entries in the compaction process so that user generated entries aren't necessary for the migration, but they aren't removed. The could result in duplicates and would require the allowlist limit be decreased by the number of existing entries (200 - existing_entry_count).
Remove all existing before beginning the auto-population process. This would be the cleanest approach, but as you mentioned, it would delete user generated entries.
There may be other options to consider as well, but the options above are listed in order of effort, highest to lowest.
Factor in the existing entries in the compaction process so that the resulting allowlist will use the existing entries as part of the migration.
The downside of course is it may not be clear to the user that the item they added has been "compacted" into a higher group level, but I think that's something we can communicate in the documentation/UX.
The other side of this is that if they choose to "revert", we will no longer have the item they added manually. I'm not sure what to do about that
@darbyfrey I think the first option, which most effort, would be the most delightful for users:
Factor in the existing entries in the compaction process so that the resulting allowlist will use the existing entries as part of the migration.
If we factor in the existing entries, what do we do if there is a duplicate? I assume the manual entry takes precedence in this case, so if a user were to revert, we would keep the manual entry. @shampton do we know what happens if there is a duplicate? I'm wondering if the easiest thing to do here is to populate, then mark duplicates as user entries.
Then that leaves us down the compacting/200 item limit problem.
@shampton looks like we responded at the same time.
Let's go with the first option.
We should tag duplicate entries at manual - then when the customer reverts, the manual entry will stay. We already have the ability to track duplicates:
Perhaps when we do the automated entries, the automated duplicates can be kicked out.
@jocelynjane It sounds like you are advocating for option 2 where we don't compact entries that are already in the allowlist.
The downside to this is that the compaction wouldn't be optimal. I'm fine with this if we want to go that route.
Option 1 was including the manual entries in the compaction so that it was optimal, but like I said we would lose the ability to revert so maybe option 2 is better.
@shampton apologies, I got ahead of myself. I mean let's go with option 1 - and I'm wondering if we already have some of the parts here to reuse that could make this less tedious to implement.
My understanding is somewhere in the allowlist database we are tagging if the entry is automated. We also currently do detect and reject duplicate entries. I'm picturing a scenario where the automated list is fed in to the allowlist. If it's a new entry from the automation, that gets tagged as auto; else, it retains whatever identification we use to note manual/already existing entry. In this case, if a user then selects revert, we would keep all the manual entries and only remove the automated ones.
We would still though, have to figure out the compacting based on the new list.
Realistically, I'm not sure how we would do Option 2 without overriding our existing mechanism for detecting and rejecting duplicates. I'm not convinced it is a good idea to bypass that process.
@darbyfrey Do we have a number for how many projects have items in their allowlist but don't have the allowlist enforced? I'm wondering how low this edge case is.
Also, out of those projects, how many of them have auth log items over the 200 limit?
This makes sense, but I think we can do it without the user-added entries being lost. I've proposed a change in this MR which can accomplish this. This means the resulting allowlist will ensure access to all the projects listed in the authorization log, and that the user-added entries will remain untouched.
@bonnie-tsang In that case, I don't think we need to add any extra warning to the user since their already added items will not be lost and they can safely revert.
We should make sure we make this clear in the documentation - and customers can self-select to do their own, manual deletion if there is a group/project overlap (basically to free up limit if needed).
Thanks @darbyfrey@jocelynjane@shampton for the discussion and alignment on moving forward with Option 1.
I've updated the design in #509883 (closed)+ to address the feedback received, and it should reflect the direction we've chosen.
Hi Darby, wanted to check in. From what I understand you were taking on the work for the frontend development here. Do we think we will be able to complete prior to 17.9? Also linking out to this backend issue as well given they are connected.
Thanks @mgandres! I updated the issue description with a rough breakdown of work. Feel free to adjust to a breakdown that makes sense for you. I'll put my name next to any of the items I'm working on.
FYI I also split "Show warning when adding entries exceeds allowlist limit" and the feature flag rollout as a separate tasks in the feature breakdown in the description.
!177465 (merged) is done, just waiting for the maintainer to merge, and !177785 (merged) is in review now, so hopefully those blockers will be cleared soon.
@darbyfrey For the design for checking if adding entries will exceed the allowlist limit, do you know how I can get the value for the limit, or what I should be checking for it? I'd probably need to lower the number on my gdk for testing purposes as well.
@darbyfrey Is the 200 limit inclusive of both projects and groups, or is it a limit that's applied separately to both entities? I see that there's a GROUP_LINK_LIMIT in https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/models/ci/job_token/group_scope_link.rb#L13 (separate from PROJECT_LINK_DIRECTIONAL_LIMIT) so I wanted to know if I should be checking the combined projects and groups in the allowlist or if I should be doing a separate check for projects and groups.
Oh wait, I just realized. I guess the GROUP_LINK_LIMIT shouldn't matter since the auth log should only consist of projects, is that right? So we only need to check the project limit.
Hi @mgandres, good question! It's a tad confusing, but the short answer is that we should be using the limit in PROJECT_LINK_DIRECTIONAL_LIMIT and not combining it with GROUP_LINK_LIMIT.
The compaction process can produce a list of project and group links of 200 or less. It could be 200 projects, or 200 groups, or some combination. So we set the upper limit of each type to 200 to ensure that it will work correctly in any of these cases. Does that make sense?
Hey @mseferian-jenkins! I think this will carry over to %17.10. Frontend-wise, there are still two MRs that need to be merged. The description should be updated with the current status of these MRs.
!180827 (merged) is in review, though currently it's missing some documentation that still needs to be written (perhaps to be added in another MR if this one gets merged first). Edit: Ah we already have the docs MR in !181294 (merged)
!179212 (merged)was in review, but there are some UX discussions in !179212 (comment 2338051370) that require changes from the initial design. I'm currently working on these, then it can move forward with the review.