Skip to content

Resolve "Auto generating database index per view"

What is in this MR

This MR introduces two important changes to speed up row listing for views/tables with a lot of rows.

The first one is the introduction of the django-cachalot library to cache slow and recurring queries like table.objects.count(). Counting objects with big tables is notoriously slow, but at the moment is strongly needed for the UI to work correctly. Because Cachalot caches all the tables or just the ones specified in a fixed list, this MR patches the library to add a way to introduce only the query we want for our dynamic tables.

The second important change is the introduction of a way to add a index per view, based on the view sorts (the most expensive operation of the query). In this way views sharing the same sorts can share the same index and every index can easily been deleted when not needed anymore.

A future MR could introduce a way to add conditions to the index based on the filters set for the view, but atm the most of the benefits are from the two changes described above, while introducing a way for every filter to contribute to the index can introduce quite a lot of changes that can be added in a separate MR if more performances are needed.

How to test this MR

  • Add 1M rows to a table
  • Go to the backend container terminal and do: BASEROW_BACKEND_DEBUG=off FEATURE_FLAGS= DJANGO_SETTINGS_MODULE=baserow.config.settings.base ./docker/docker-entrypoint.sh gunicorn-wsgi
  • Scroll a view without sorts (view A) for the table filled with data and notice the response time for every request in the browser developer tools
  • Create a new view with sorts (view B) and notice how the time needed for the requests increases considerably.

Now stop the previous command and run BASEROW_BACKEND_DEBUG=off FEATURE_FLAGS=auto-index-view DJANGO_SETTINGS_MODULE=baserow.config.settings.base ./docker/docker-entrypoint.sh gunicorn-wsgi

  • Scroll view A again and notice the difference caused by caching the count query.
  • Re-create (or remove and re-add sortings to) view B to see how the performance dramatically increases now when a sort is applied

In my tests with a table with 5M rows, the request time goes down from 4.5 secs to 0.3 secs (15x) for the view without sorting (view A) and from almost 10 secs to 0.4 secs (25x) for the view with sorts (view B) at the beginning of the table. Going to the end of the table, the request time goes down from 6 secs to 0.4 secs for view A and from >30 secs to 2 secs for view B.

Merge Request Checklist

  • changelog.md has been updated if required.
  • New/updated Premium/Enterprise features are separated correctly in the premium or enterprise folder
  • The latest Chrome and Firefox have been used to test any new frontend features
  • Documentation has been updated
  • Quality Standards are met
  • Performance: tables are still fast with 100k+ rows, 100+ field tables
  • The redoc API pages have been updated for any REST API changes
  • Our custom API docs are updated for changes to endpoints accessed via api tokens
  • The UI/UX has been updated following UI Style Guide

Closes #720 (closed)

Edited by Davide Silvestri

Merge request reports