Skip to content

Improve Statistics Calculation Runtime

Closes #622 (closed) Closes #895 (closed)

What does this MR do?

This MR is contains a complete rewrite of the bulk of nightly statistics calculations. The php-controlled, loop based calculations we currently use has a tremendous amount of overhead. This can be reduced by rewriting all of it as single SQL statements.

This rewrite also made the StatsModel class obsolete, as well as some other legacy code.

Since the large SQL-queries might be hard to read and understand, I added an extensive explanation in the class description. Hopefully that convinces anyone of you to review this MR 😄

On top of that I added an option to the region calculation, which is currently done incrementally (based on previous stat calculations), to calculate the stats completely from scratch. Surprisingly this only takes about 5s longer (which probably means that the dates are not properly indexed).

How confident are you it won't break things if deployed?

I tested all queries thoroughly, and it all works locally. This is a big change however, so I'd say about 90%.

How to test

Run the stats scripts:

./scripts/docker-compose run --rm --no-deps app php -f run.php Stats foodsaver
./scripts/docker-compose run --rm --no-deps app php -f run.php Stats betriebe
./scripts/docker-compose run --rm --no-deps app php -f run.php Stats bezirke

./scripts/docker-compose run --rm --no-deps app php -f run.php Stats bezirke false

The last command runs the region stats non-incrementally. So if you want correct values in your local environment you would have to run that once.

Benchmark

Stat calculation Old runtime (on prod server) New runtime (on test server) Speedup
Foodsaver 25m 24s 22.3s 6834%
Store 2m 58s 16.3s 1092%
Region 2m 39s 48.4s 329%
Total 31m 1s 1m 27s 2139%

I think an even larger speedup can be expected, when the new queries are executed on the production server.

Further Optimization Potential

  • Incremental stat calculations for foodsavers and stores
  • Proper indexing of dates

Checklist

  • added a test, or explain why one is not needed/possible...
  • no unrelated changes
  • asked someone for a code review
  • set a "for:" label to indicate who will be affected by this change
  • used "state:" labels to track this MR's state until it was beta tested
  • added to the next milestone (see https://gitlab.com/foodsharing-dev/foodsharing/-/milestones, unless it has a "for:Dev" label)
  • added an entry to CHANGELOG.md
  • added a short text that can be used in the release notes
  • Once your MR has been merged, you are responsible to create a testing issue in the Beta Testing forum: https://foodsharing.de/?page=bezirk&bid=734&sub=forum. Please change the MRs label to "state:Beta testing".
    • Consider writing a detailed description in German.
    • Describe in a few sentences, what should be tested from a user perspective.
    • Also mention different settings (e.g. different browsers, roles, ...) how this change can be tested.
    • Be aware, that also non technical people should understand.

Release notes text in German

Nächtliche Statistikberechnungen wurden optimiert.

Edited by Anton Ballmaier

Merge request reports