Skip to content

Refactoring - Data and IndividualData QoL

Romain Girard requested to merge RomainGirard/leaspy:refact-data into dev

What does the code in the MR do ?

This MR implements a bunch of quality of life changes: minor refactoring steps for Data and IndividualData, enables ID-based membership tests for Data, updates the documentation and type hints, and improves tests and coverage.

New features
  • ID-based membership test for Data objects, i.e. correct implementation of 'SUB-ID-12' in data using __contains__, as illustrated (for IndividualParameters) in #42.
Refactoring

IndividualData

  • Refactor observation management by using a single, 2D numpy array, instead of a list of lists
  • Remove .add_cofactor() and .add_observation() which were duplicates
  • Remove .individual_parameters which I guessed was legacy (since IndividualParameters now exists for this purpose) and was not used anywhere

Data

  • Rewrite the .to_dataframe() method to:
    • Check for cofactors presence
    • Remove the torch dependency for readability, given that it was purely internal and numpy was already needed
  • Remove .get_by_idx() which was legacy (replaced by __getitem__)
  • Set computed attributes as properties, e.g. .dimension, for easier maintenance, assuming it is not computed too frequently
Documentation

Updated documentation of most Data and IndividualData methods, including more accurate type hinting and appropriate Leaspy exceptions

Where should the reviewer start ?

I would suggest IndividualData as a whole, then Data, focusing first on .to_dataframe() and .__contains__() which are the major changes.

How can the code be tested ?

New tests have been implemented for the added / refactored features. All tests succeed.

When is the MR due for? (review deadline)

Those are mostly quality of life / nice to have changes, so no priority.

What issues are linked to the MR ?

This partly addresses #42 (comments)

Merge request reports