Investigate and optimize PostReceive worker performance bottlenecks based on instrumentation data
Part of Improve PostReceive Worker Performance to Meet ... (&19159)
Summary
Using data from enhanced instrumentation (see Add granular instrumentation to PostReceive wor... (#566545)), identify slowest components of the PostReceive
worker to achieve the 10‑second performance target.
Kibana: https://log.gprd.gitlab.net/app/r/s/Gd0Pq+
Problem
Based on initial analysis and the parent issue investigation, several performance bottlenecks have been identified:
-
Redis Cache Operations
- 35,566 repository cache calls that took 9.7 seconds
- Related to the Epic gitlab-org&17190 for cache optimization
-
Lock Contention
- Multiple PostReceive jobs competing for the same locks
- Particularly problematic with pull mirroring scenarios
-
Database Query Performance
- Transaction overhead and query optimization opportunities
- Potential N+1 query patterns
-
CPU-Intensive Operations
- Repository expiration logic
- Event processing overhead
Approach
- Analyze the results of instrumentation from Add granular instrumentation to PostReceive wor... (#566545)
- Analyze patterns and identify top bottlenecks
- Create issues to optimize each bottleneck
Acceptance Criteria
-
Performance analysis report based on instrumentation data -
Create issues for identified bottlenecks (ideally, with suggestions how to fix them)
Related Issues
- Related to: #553426
Edited by Vasilii Iakliushin