Skip to content

Resolve "Speed up LabelledDataGroup "all" selection"

This MR improves the speed of the LabelledDataGroup selection. In particular the shortcuts for special cases (select single member, select all) are provided/improved. Moreover the data preparation is made more efficient for the merge combination method. Along with that the selection methods are restructured and separated to make everything more transparent.

A new combination method, auto, is added, which automatically chooses between merge and try_concat based on the datatype.

Details

  • Implement the _select_all_merge shortcut:
    • Directly iterate over the group members and try to merge them instead of (building and) using the member map
  • Add the auto combination method (and make it the default option):
    • merge combination changes the dtype to float when adding nans due to missing values, whereas concat does not. Therefore, choose try_concat if dtype is int and merge otherwise.
  • Speed up selection when combining via merge:
    • Use .where on the pre-selected member map to get all coordinates for the respective containers (avoiding to select the same group member multiple times).
  • Extend and restructure tests to cover as many selection scenarios as possible

Anything to double-check?

  • Selection interface transparent and consistent?

Can this MR be accepted?

  • Implementation ready
  • Tests added or adjusted
  • Documentation extended or updated (Adapted the docstrings)
  • Code quality
  • Ready for merging
    • Pipeline passes without warnings
    • History cleaned-up or squash option set
    • Changelog entry added
    • Version number bumped to v0.17.0a0
    • Reviewed & approved by @

Related issues

Closes #93 (closed)

Edited by Utopia Developers

Merge request reports

Loading