[Error Budget] Improve `getNamespaceStorageStatistics` performance
Summary
The grouputilization error budget regularly sees apdex failures (5 second duration) for the getNamespaceStorageStatistics
graphql query, which is used within the Usage Quotas page. For example, in the last 7 days:
This is negatively impacting our error budgets and customer experience.
Proposal
Identify and implement improvements for the performance of the page/query, some initial thoughts are:
-
Split up the query
Currently, the query fetches all projects and their statistics from every group in the namespace hierarchy in one request. Do we need to do it this way? Can we maybe split it up to be more performant (more API calls, but each being significantly quicker?), or reduce how much we're showing on the page initially?
-
Improve query performance
Review each component of the query and see if improvements (DB indexes, scope changes, etc) can be made. For example, after Backfill null/empty root_namespace_id for proje... (#424730) is implemented, perhaps we can make use of that for a much faster query to fetch all project statistics
-
< insert more ideas here >
It's worth noting the number of apdex failures are relatively low - at the time of writing, in the last 7 days we've seen 6 apdex failures from 2508 requests, approx. 0.2% of all requests.