Investigate 500 errors on member-heavy pages for self-managed instances
Summary
A self-managed customer is experiencing 500 errors when attempting to access pages that contain large numbers of members (3000+). The errors occur after a timeout period and appear to affect both project member lists and group member lists. Additionally, a GraphQL API query for a specific project is consistently failing, impacting permission auditing workflows.
Similar to GitLab.com issue #459041 (closed), but affecting self-managed instances.
Example areas:
- The project members page (
/project_members) - The group members page (
/groups/{group-path}/members)
Current Behavior
- 500 error after a waiting period when visiting from a browser
- Logs show a rack-timeout error (from puma.sterr):
source=rack-timeout timeout=60000ms service=60001ms state=timed_out at=error
-
REST API requests work but are extremely slow
-
API performance comparison:
- REST API takes 12-16 hours (sometimes 24+) to complete permission auditing
- GraphQL API completes in ~2.5 hours for most projects
- One specific project consistently fails GraphQL queries after several seconds
Expected Behavior
- Pages should load successfully with proper pagination
- Response time should be within acceptable limits
- No timeout errors should occur
- GraphQL API queries should complete successfully for all projects
Workarounds
- REST API access works but is extremely slow
- GraphQL API provides significantly better performance for most projects
- No known frontend workarounds at this time
- No current workaround for the failing project-specific GraphQL query
Impact
- Affects administrative functionality
- Prevents users from viewing member lists in large projects/groups
- Blocks effective user management for organizations with large teams
- Disrupts permission auditing workflows
- One project completely inaccessible via GraphQL API, blocking automated auditing
Edited by 🤖 GitLab Bot 🤖