Solve quadratic nightly stats execution time by replacing inefficient model calculations with large SQL queries
This is a child issue of #622 (closed) and #9 (closed) addressing store calculation
This is a possible implementation suggestion for a partly replacement of our big goal of replacing models with gateways in #9 (closed). The idea out of #622 (closed) is, to replace current php query logic having many loops with a extremely large amount of queries and handling everything in ~"tech SQL / DB".
Current loops come with quadratic runtime!! See: #622 (closed)
This is one out of three nightly scripts with the largest execution time with currently 65 minutes. This can be checked by our logging-team over at #fs-ops (protected Slack channel).
To fix this use both fs_abholer
and fs_betrieb_team
to update values of existing entries in fs_betrieb_team
table. Use betrieb_id
and foodsaver_id
for grouping.
Maybe these queries could be built one for each column to keep them maintainable for non advanced sql people? ;-) Might be an anti-pattern. Not sure if it makes sense.
stat_fetchcount
...is a simple count with grouping over entries in fs_abholer
. Previously there was a complex addition in place with a very inefficient implementation using the only the last changes after stat_last_update
but thousands of times.
first and last fetches
...stat_first_fetch
and stat_last_fetch
maybe could be used in one update statement as above grouping might be problematic here anyways.
Field stat_last_update
in fs_betrieb_team
is not used by other scripts. Could be dropped.
Conditions for all of these queries:
-
confirmed
= 1 -
date
< last midnight (instead ofnow()
) - There might be entries in
fs_abholer
of deleted foodsavers or those who left the team. Please keep this in mind.
Our stats say there are roughly over 2 million entries in fs_abholer
table. I fear shrinking response times of the site having large queries like this. But I don't have deeper experience judging this. Won't matter much as it is being executed only one time per day after 02:15 AM. Execution time can be checked with our (aiming to grow) DB-admin-team.
Moving this from quadratic O(n²) to a linear O(n) would be a huge success defusing a ~"growth problem" here as all other nightly stats scripts are inefficient but at least linear.
This task saves CO2 ;-)
Later on we can take bets on how fast the new implementation will be ;-)