Refactoring

Romain Girard requested to merge RomainGirard/leaspy:refact-data into dev May 31, 2022

What does the code in the MR do ?

This MR implements a bunch of quality of life changes: minor refactoring steps for Data and IndividualData, enables ID-based membership tests for Data, updates the documentation and type hints, and improves tests and coverage.

New features

ID-based membership test for Data objects, i.e. correct implementation of 'SUB-ID-12' in data using __contains__, as illustrated (for IndividualParameters) in #42.

IndividualData

Refactor observation management by using a single, 2D numpy array, instead of a list of lists
Remove .add_cofactor() and .add_observation() which were duplicates
Remove .individual_parameters which I guessed was legacy (since IndividualParameters now exists for this purpose) and was not used anywhere

Data

Rewrite the .to_dataframe() method to:
- Check for cofactors presence
- Remove the torch dependency for readability, given that it was purely internal and numpy was already needed
Remove .get_by_idx() which was legacy (replaced by __getitem__)
Set computed attributes as properties, e.g. .dimension, for easier maintenance, assuming it is not computed too frequently

Documentation

Updated documentation of most Data and IndividualData methods, including more accurate type hinting and appropriate Leaspy exceptions

Where should the reviewer start ?

I would suggest IndividualData as a whole, then Data, focusing first on .to_dataframe() and .__contains__() which are the major changes.

How can the code be tested ?

New tests have been implemented for the added / refactored features. All tests succeed.

When is the MR due for? (review deadline)

Those are mostly quality of life / nice to have changes, so no priority.

What issues are linked to the MR ?

This partly addresses #42 (comments)

Refactoring - Data and IndividualData QoL