Step-up auth: Improve step-up auth recovery mechanism for group owners
Proposal
This proposal addresses the recovery challenges when group-level step-up authentication is misconfigured or accidentally enabled, which can lock out group owners and team members from accessing their groups and projects. Currently, we allow group owners to bypass step-up authentication for edit and update actions as a pragmatic solution, but this creates a security trade-off that weakens the overall protection model.
Related Issues and Resources
Issues and MRs:
- Parent Issue: #474650 - Step-up authentication with OIDC for group scope
- Implementation MR: !199800 (merged) - Step-up auth: Group protection (final integration and testing) [4/4]
- Review Discussion: MR comment discussing the chicken-and-egg problem
Documentation:
Problem Statement
When group-level step-up authentication is enabled (issue #474650), it adds an additional OAuth-based authentication layer before users can access group resources. However, this creates potential lockout scenarios that currently require a security compromise to resolve:
Scenario 1: Accidental Activation
A group owner unintentionally enables step-up authentication for their group, immediately blocking both themselves and all team members from accessing:
- Group pages (issues, merge requests, settings)
- All projects within the group
- Group administration capabilities
Without a recovery mechanism, the entire team is locked out and must contact GitLab instance administrators to manually reset the configuration via database access or Rails console. This creates significant operational overhead and disrupts team productivity.
Scenario 2: Insufficient Authentication Assurance Levels
A group owner enables step-up authentication with specific OAuth provider requirements (e.g., acr: gold), but later discovers that:
- Team members only have lower authentication levels (e.g.,
acr: silver) - The required security level is unattainable for some or all team members
- No upgrade path exists to reach the required authentication level
- The OAuth provider becomes unavailable or misconfigured
This permanently blocks team members from working on their projects, again requiring administrator intervention to resolve.
Current Workaround and Its Security Trade-offs
To prevent complete lockout scenarios, the current implementation allows group owners (users with admin_group permission) to bypass step-up authentication when accessing the edit and update actions. See the skip_step_up_auth? implementation for details.
This creates significant security trade-offs:
-
⚠️ Weakened Protection: Group owners can access and modify sensitive group settings without completing step-up authentication, directly undermining the security feature's purpose -
⚠️ Defeats Core Intent: The most privileged users (who have the greatest access to sensitive data and configurations) can bypass the very protection meant to secure those resources -
⚠️ Incomplete Solution: While owners can reconfigure settings to unblock themselves, team members remain locked out until the issue is resolved, causing productivity loss and support burden
This is an intentional security compromise implemented to prevent worse outcomes (permanent lockout requiring database-level intervention). However, it represents a significant gap in the security model and highlights the urgent need for a more sophisticated recovery mechanism that maintains security while preventing lockouts.
Possible Recovery Approaches
Several approaches could be considered to address the recovery problem while maintaining security:
- Email-Based Recovery Flow: Time-limited recovery tokens sent via email, similar to password reset functionality
- Multi-Owner Approval System: Require approval from multiple group owners before allowing step-up auth reconfiguration
- Grace Period for New Configurations: Provide 24-48 hour warning period before full enforcement begins
- Instance Administrator Override: Dedicated admin tools to temporarily disable step-up auth for specific groups with full audit logging
- Break-Glass Emergency Access: One-time emergency access codes generated when enabling step-up auth
Each approach has different trade-offs between security, usability, and implementation complexity. Further design work is needed to evaluate these options and potentially combine elements from multiple approaches.
Open Questions
- Which recovery approach (or combination) provides the best balance of security and usability?
- Should recovery mechanisms differ based on GitLab tier or deployment type (self-managed vs. GitLab.com)?
- What are the appropriate time limits and thresholds for recovery mechanisms?
- How should we handle audit logging and administrator visibility for recovery events?
Next Steps
This issue requires further elaboration and design work before implementation. The next steps include:
- Security Review: Gather feedback from security team on proposed approaches and identify security requirements
- Design Documentation: Create detailed design documentation for the selected recovery approach, including technical specifications and user flows
- Implementation Planning: Break down the selected solution into concrete implementation issues with clear acceptance criteria