Add pandas dataframe support to MDODiscipline

This is specification proposal for adding pandas data frame support to the disciplines.

A discipline takes inputs and produces outputs, which are both dict of np arrays. Those data are stored in a single dict of nunoy arrays named local_data, which is widely used in gemseo.

The purpose of this issue is to allow some or all of the inputs, and/or the ouputs, dict values to be pandas DataFrame instead of np arrays. A DataFrame is a data structure that contains labeled axes (rows and columns) that can be thought of as a dict-like container of pseudo numpy arrays. The validation of the inputs and outputs could be extended to eventually the contents of the DataFrame. For instance one may need to validate the DataFrame columns (number, names, ...) and arrays (type, length, values, ...).

Because of the nature of a DataFrame and because changing the structure of local_data would imply to adapt the client code of the disciplines that use it, the best approach seems to transparently expose the DataFrame content via local_data without changing the nature of this local_data.

This could be done by having local_data expose a DataFrame with compound names bound to numpy array in-memory views of the DataFrame arrays. A compound name could be composed of the name of the inputs item, a separator and the name of the DataFrame column. The numpy array in-memory view may not be possible for all kinds of DataFrame arrays and this shall explicitely raise an error otherwise. We shall make sure that the use cases for DataFrame match this constraint.

For the generic queries with local_data API, such as iterating, the local_data/DataFrame exposition would be restricted to only the compound names, the DataFrame would not be provided. To allow retrieving a DataFrame and from its original name, a specific query with local_data API, like .get() or [], could be used.

A DataFrame could be both used for the inputs and outputs.

Example:

inputs = {'x': DataFrame(data={'a': [0]})}

would yield with the separator = ":"

local_data = {'x:a': <view of np.array([0])>}

but the DataFrame could also be retrieved with local_data['x'], while local_data['x:a'] would provide the np array in-memory view. Iterating over local_data would only provide the 'x:a' entry.

The behavior of the grammars would not be changed, the schemas could directly use 'x:a', but not 'x'.

Edited Jan 14, 2022 by Antoine Dechaume