Monitor Stage Performance Audit - Identify and address underperforming SQL queries
To address recent performance issues we are conducting an audit of our backlog and any known SQL queries or backend jobs that could impact availability. If you have ideas about where we could improve database performance or identify issues that should be prioritized, add them to the comments here.
In the light of recent database performance issues we have created an epic to address the queries that are either top 10 in time taken or top 10 in the frequency of calls during the monitoring period - Volunteer Research Database Performance Peak queries and create issues (&5652)
Now we are asking teams to review their backlogs (and possibly tribal knowledge) to proactively prioritize and address known query performance issues that are not captured within this epic. This is an urgent request to help reduce the load on our database.
If there are no queries that immediately come to mind for your teams, please take some time to brainstorm some areas of improved efficiencies and reducing database pressure on our primary. What to look for?
- Known n+1 issues
- Read-only queries that can be redirected to read-only replicas
- Opportunities for caching data on frequent reads
- Lazy loading data (example [Lazily load pipeline artifact in project pipelines index page (gitlab#324003)] (gitlab-org/gitlab#324003 (comment 526293202)))
- Removing or not exact caching costly but not super needed information totally or through a feature flag (so we only turn it off for .com for the time being). See comment below.
-
Unknown unknowns: please think creatively about ways to find and optimize queries, or otherwise relieve DB load!
- Use Elasticsearch with PostgreSQL slow logs to collect more information when you have suspicious queries.
- Look at this dashboard for endpoints that have a high number of SQL calls