Skip to content

Resolve "Multiple select field"

Ghost User requested to merge 319-multiple-select-field into develop

Merge Request Checklist

  • changelog.md has been updated if required
  • Documentation has been updated
  • Quality Standards are met
  • The UI/UX has been updated following UI Style Guide
  • The redoc API pages have been updated for any REST API changes
  • The per database user API docs page has been updated for any REST API changes made to endpoints which can be accessed via a user created token
  • Has performance been considered and tested when appropriate? Ideally Baserow should be performant when working with hundreds of thousands of rows
  • New/Changed Premium features are separated correctly in the premium folder
  • There are tests for and it has been checked that any changed/new django ORM code is sensible, doesn't perform N queries and that table models are not generated needlessly or generated with all columns when only a few are needed.

Closes #319 (closed)

How to store the multi relationship of a field/table to various select options?

There are numerous ways of how to store the relationship between a field and it’s various select options. Each comes with its own pros and cons.

Option 1 - Reuse LinkRowFieldType Behavior

Copy the behavior of the current implementation of the LinkRowFieldType. For this type, the relationship between a field, and the rows of source and target table is stored in a many_to_many table for each relationship. That means that for every new LinkRowFieldType there will be a new database table (database_relation_XX) which keeps track of which row_id of the source table is linked to which row_id of the target_table.

Pro

  • The behavior works and is (due to using djangos „prefetch_related“ method) fast when fetching all the rows of a table.
  • Probably less overhead when creating the new field_type, since a lot of stuff can be reused from the LinkRowFieldMethod.

Con

  • Additional tables for every new MultiSelectFieldType to keep track of.

Option 2 - Use ArrayField

Another possible scenario would be to use an ArrayField (with model.IntegerField as a base_field) as the underlying model for the new field type. That way all the added options to a multi select field could be stored in the array.

Pro

  • All the selected options are stored as close as possible to the relevant row. No additional tables are needed.

Con

  • Custom logic needs to be added in order to create/update/delete a single selected option in the array field. While it is possible to access each element of an array, you need to find out the position of the element, before you could update/delete it from the array. From the PG docs:

    • Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
  • No referential integrity. PG currently does not seem to have the possibility to constraint an array column to have foreign keys to other tables. Django does not let one use „ForeignKey“ fields as base_fields for the ArrayField.

  • Since we can’t establish a foreign_key relationship, we can’t use prefetch_related and need a custom query in order to join the related select options.


Option 3 - Use one Many-to-many through table for all relations

Another possibility is to create a new db table which stores all combinations of a multi select field to their respective select options. This would be like the LinkRowFieldType works, except that we only use one table to store all the relations instead of one table for each new field_type.

Pro

  • Get almost exactly the same behavior as the LinkRowFieldType, which we know works really well.
  • We don’t have to create a new table for each new multi select relation.

Con

  • Since we only want to use one table for the relationships, we need to add a third entry to the new relationship table, which keeps track of which field each relationship belongs to. In the LinkRowField this is achieved by having separate tables. It’s not straight forward to make a many-to-many relationship work with an additional factor in the relationship table. While it is relatively easy to save the additional entity to the table (with „through_defaults“) there is currently no way to tell the Django ORM to filter based on two factors (we’d have to manipulate the behavior of Djangos „ManyToMany“ field, which currently only allows for one „core filter“ value).

Conclusion: Use Option 1

Option 2 and 3 would both end up requiring a lot of custom code which in the end adds a lot of complexity. Option 1 will add a lot of additional tables to the database but the behavior is already well established in Baserow (with the LinkRowField). Hence this is the preferred solution.


Technical Plan

Requirements

  • The MultipleSelectFieldType must be able to store its select options in the same database table as the SingleSelectFieldType.
  • The new MultipleSelectFieldType must be able to store more than one select option. A minimum of twenty select_options without performance implications.
  • The MultipleSelectFieldType is sortable (by the first element in the added select_options).
  • The MultipleSelectFieldType should work as a primary key.
  • The filters „contains“/„contains not“ and „is empty“/ „is not empty“ should work the same as they do with the „single_select“ field type. The „contains“ query text must be compared against all select_options that are currently added to the field.
  • It should be possible to search for a select option and if a MultipleSelectField contains the search text in any of it’s added select_options then it should be shown in the results.
  • It should be possible to filter based on whether the field „has“ a certain select option (analog to how the LinkRowHas filter works).
  • When fetching all the rows of a table which holds a MultipleSelectFieldType then all the „select options“ should be joined, so that a single database select query will fetch all the needed values.
  • The MultipleSelectFieldType must be usable in the form view.
  • When adding a select option value to the row, it must be removed from the available select options dropdown. This will make sure that users can only a select option value once.
  • The MultipleSelectFieldType must be compatible with the trash functionality. That means that when deleting a field and restoring it afterwards, all the previously added select options must still be there.
  • When converting a „multiple_select“ field type to a „single_select“ field type, the select options must be preserved. The first „selected option“ in the „multiple_select“ field type must be the selected option in the converted to „single_select“ field type.
  • When a select option is deleted from a "multiple_select" field type then all the relationships for that select_option must be correctly deleted from the through table.
  • When converting from a „single_select“ field type to a „multiple_select“ field type then the select options must be preserved. The currently „selected option“ must be added to the „multiple select“ field.
  • When converting from a "multiple_select" field type to a "text" field type then the added select_options must be stringified and added to the text field (separated by comma).
  • When converting from a "text" field to "multiple_select" field then the text value should be splitted by comma and each resulting entry should be added as a select_option to the field.
  • The "fill_table" command should work when the "multiple_select" field type is present in a table.
  • When exporting the grid view with a „multiple_select“ field type, then all the select options that are added to this field, should be in the exported file.
  • Importing/Exporting group applications must be possible when "multiple_select" field type is present on a table.
  • Copying a grid view cell of the „multiple_select“ field type that already has multiple selected values and pasting those into another cell, should make sure that every select option correctly gets added to the underlying MultipleSelectField.

Backend changes:

  • A new field model for the „multiple_select“ field type will be created. This extends the Field model.
    • Create the relevant database migration file.
  • An abstract "SelectOptionsField" class will be created, because an abstract class is needed in our field type, when dynamically generating the "many-to-many"-relationship between the table model and the select options table.
  • A new FieldType for the „multiple_select“ field type will be created.
    • Needs to accept „select_options“ when creating the new field type. Since the SingleSelectFieldType already has this functionality, the functionality can be split out into an abstract base class from which both the „SingleSelectFieldType“ as well as the „MultipleSelectFieldType“ can inherit from.
    • The Django model field for this field type will be a "ManyToMany"-Field.
      • This will be created in the "after_model_generation" function, because we need the already created table model to correctly establish a relationship between that model and the select options.
    • Implement the „enhance_queryset“ function, so that all the related select options can be joined in one database query.
    • Implement the contains_query in such a way that the relevant „value“ will be searched for in all the related select options.
    • Implement the export_serialized/import_serialized functions, so that tables with a "multiple_select" field type can be correctly export and imported.
    • Implement the "get_alter_column_prepare_old/new" functions.
    • Register this new field type with the field type registry.
  • Add a new view filter type, for the MultipleSelectFieldType, which creates a „is in“-filter. Check what can be reused from the LinkRowHas filter.
Edited by Ghost User

Merge request reports