Resolve "Auto generating database index per view" (!1383) · Merge requests · Baserow / baserow

Davide Silvestri requested to merge 720-auto-generating-database-index-per-view into develop Apr 18, 2023

What is in this MR

This MR introduces two important changes to speed up row listing for views/tables with a lot of rows.

The first one is the introduction of the django-cachalot library to cache slow and recurring queries like table.objects.count(). Counting objects with big tables is notoriously slow, but at the moment is strongly needed for the UI to work correctly. Because Cachalot caches all the tables or just the ones specified in a fixed list, this MR patches the library to add a way to introduce only the query we want for our dynamic tables.

The second important change is the introduction of a way to add a index per view, based on the view sorts (the most expensive operation of the query). In this way views sharing the same sorts can share the same index and every index can easily been deleted when not needed anymore.

A future MR could introduce a way to add conditions to the index based on the filters set for the view, but atm the most of the benefits are from the two changes described above, while introducing a way for every filter to contribute to the index can introduce quite a lot of changes that can be added in a separate MR if more performances are needed.

How to test this MR

Add 1M rows to a table
Go to the backend container terminal and do: BASEROW_BACKEND_DEBUG=off FEATURE_FLAGS= DJANGO_SETTINGS_MODULE=baserow.config.settings.base ./docker/docker-entrypoint.sh gunicorn-wsgi
Scroll a view without sorts (view A) for the table filled with data and notice the response time for every request in the browser developer tools
Create a new view with sorts (view B) and notice how the time needed for the requests increases considerably.

Now stop the previous command and run BASEROW_BACKEND_DEBUG=off FEATURE_FLAGS=auto-index-view DJANGO_SETTINGS_MODULE=baserow.config.settings.base ./docker/docker-entrypoint.sh gunicorn-wsgi

Scroll view A again and notice the difference caused by caching the count query.
Re-create (or remove and re-add sortings to) view B to see how the performance dramatically increases now when a sort is applied

In my tests with a table with 5M rows, the request time goes down from 4.5 secs to 0.3 secs (15x) for the view without sorting (view A) and from almost 10 secs to 0.4 secs (25x) for the view with sorts (view B) at the beginning of the table. Going to the end of the table, the request time goes down from 6 secs to 0.4 secs for view A and from >30 secs to 2 secs for view B.

Merge Request Checklist

changelog.md has been updated if required.
New/updated Premium/Enterprise features are separated correctly in the premium or enterprise folder
The latest Chrome and Firefox have been used to test any new frontend features
Documentation has been updated
Quality Standards are met
Performance: tables are still fast with 100k+ rows, 100+ field tables
The redoc API pages have been updated for any REST API changes
Our custom API docs are updated for changes to endpoints accessed via api tokens
The UI/UX has been updated following UI Style Guide

Closes #720 (closed)

Edited Apr 24, 2023 by Davide Silvestri

Resolve "Auto generating database index per view"

What is in this MR

How to test this MR

Merge Request Checklist

Merge request reports