Ecosystem BE - DB Performance Audit
This issue is meant as a place to track the work for the Ecosystem BE team to audit our DB usage and identify areas for improvement.
Issues
Note: Instead of listing individual issues here, I am going to use issue boards so we do not have to maintain the list in multiple places.
Webhook Logs Table
We are currently partitioning the webhooks logs table, which is growing so fast it is going to run out of possible ID values. That work is tracked here: gitlab-org&5558 (closed)
References
- Main epic: gitlab-org&5652 (closed)
- Audit epic: gitlab-org&5675 (closed)
- Grafana SQL queries per request
- Query optimization tips: gitlab-org/gitlab#325480 (closed)
- Epic for requests with >100 queries: gitlab-org&5670 (closed)
- Ecosystem dashboard: https://dashboards.gitlab.net/d/stage-groups-ecosystem/stage-groups-group-dashboard-create-ecosystem?orgId=1
- Slack channel: https://gitlab.slack.com/archives/C01P4L1TND7
Previous Actions
-
Review open issues with 100+ SQL queries for potential N+1 issues -
Review requests grafana dashboard for additional suspects. -
Review the Ecosystem grafana dashboard for any SQL queries that use > 100 queries per action -
Investigate our "mimic API" (used for older Jira integration) -
Investigate the Jira DVCS connector -
Review issue backlog for any existing DB issues -
Review this spreadsheet for queries. See gitlab-org&5652 (closed) for details.
Complete: Audit the codebase in your area
Please complete this audit. Things to look for:
- Known n+1 issues
- Read-only queries that can be redirected to read-only replicas
- Opportunities for caching data on frequent reads
- Lazy loading data (example [Lazily load pipeline artifact in project pipelines index page (#324003)] (gitlab-org/gitlab#324003 (comment 526293202)))
- Removing or not exact caching costly but not super needed information totally or through a feature flag (so we only turn it off for .com for the time being). See comment below.
-
Unknown unknowns: please think creatively about ways to find and optimize queries, or otherwise relieve DB load!
- Use Elasticsearch with PostgreSQL slow logs to collect more information when you have suspicious queries.
- Look at this dashboard for endpoints that have a high number of SQL calls
Complete: Audit your Current Backlog
You might have already identified some of the issues above and captured them as issues. Please assign those to team members and if possible add them to the overall Epic.
The following issues were identified from our backlog as potential DB issues needing attention. Please review these issues and update this table with next steps