Improve statistics calculation runtime
Description
A daily job calculates statistics for different things. Especially the parts for users and stores are shittily implemented doing way too many SQL queries, (currently) leading to the following runtimes:
-
Foodsaver stats: 11 minutes for ~300000 users. The list of users is queried followed by 11 queries per user. At least 8 of those could be merged to query things for all users directly.
-
Store stats: 50 minutes for ~15000? stores. The list of all stores is queried followed by one query for each store to query the store team followed by 4 queries for each team member of the store. All of those queries could be done on the top level with proper grouping.
Goal
Improve performance. Especially the store stats tend to have approximately quadratic runtime, e.g. although one hour for now is not a problem, it will be much longer with not that much more users/stores and thus may become a problem soon.
Evidence
See cron mails in #fs-ops Slack channel
Code: https://gitlab.com/foodsharing-dev/foodsharing/blob/master/src/Modules/Stats/StatsControl.php#L28
and
https://gitlab.com/foodsharing-dev/foodsharing/blob/master/src/Modules/Stats/StatsControl.php#L99
Related
- #488 (closed) improve statistics : kg per slot
- Slack discussion from starting implementing it from zero: https://yunity.slack.com/archives/C1T1ZM9A4/p1576327443055400
Slack discussion intermediate solution about only refreshing active stores. Very low hanging fruit in my opinion: https://yunity.slack.com/archives/C1T1ZM9A4/p1576594028138400