WIP: Add a primitive that flattens multidimensional ndarrays
This is different from data_preprocessing.flatten.DataFrameCommon
, which essentially takes the elements of the nested structure and enumerates them along duplicate rows of the original. The docstring had this illustration:
[
a, b, [w, x],
c, d, [y, z],
]
yields:
[
a, b, w,
a, b, x,
c, d, y,
c, d, z
]
This new primitive flattens a multidimensional ndarray and places each element in new, separate columns (making them into a feature vector, essentially). Here's an illustration taken from the docstring:
col_A col_B col_C ndarrays
0 9 a d [[[9], [8], [7]], [[6], [5], [4]]]
1 29 b e [[[6], [7], [8]], [[9], [0], [1]]]
2 49 c f [[[2], [3], [4]], [[5], [6], [7]]]
yields:
col_A col_B col_C 0 1 2 3 4 5
0 9 a d 9 8 7 6 5 4
1 29 b e 6 7 8 9 0 1
2 49 c f 2 3 4 5 6 7
The need for this arose when I tried to use an sklearn-wrap classifier (say classification.logistic_regression.SKlearn
) on an image dataset, but the output of the preceding image reader primitive (data_preprocessing.image_reader.Common
) was not in the right form for the classifier to consume.
Feedback on naming, formatting, or anything at all are welcome.