WIP: Add a primitive that flattens multidimensional ndarrays (!100) · Merge requests · datadrivendiscovery / common-primitives

The source project of this merge request has been removed.

Mark Poscablo requested to merge mark-poscablo/common-primitives-old:ndarray-flatten into master Oct 29, 2019

This is different from data_preprocessing.flatten.DataFrameCommon, which essentially takes the elements of the nested structure and enumerates them along duplicate rows of the original. The docstring had this illustration:

    [
        a, b, [w, x],
        c, d, [y, z],
    ]

    yields:

    [
        a, b, w,
        a, b, x,
        c, d, y,
        c, d, z
    ]

This new primitive flattens a multidimensional ndarray and places each element in new, separate columns (making them into a feature vector, essentially). Here's an illustration taken from the docstring:

       col_A  col_B  col_C                            ndarrays
    0      9      a      d  [[[9], [8], [7]], [[6], [5], [4]]]
    1     29      b      e  [[[6], [7], [8]], [[9], [0], [1]]]
    2     49      c      f  [[[2], [3], [4]], [[5], [6], [7]]]

    yields:

       col_A  col_B  col_C  0  1  2  3  4  5
    0      9      a      d  9  8  7  6  5  4
    1     29      b      e  6  7  8  9  0  1
    2     49      c      f  2  3  4  5  6  7

The need for this arose when I tried to use an sklearn-wrap classifier (say classification.logistic_regression.SKlearn) on an image dataset, but the output of the preceding image reader primitive (data_preprocessing.image_reader.Common) was not in the right form for the classifier to consume.

Feedback on naming, formatting, or anything at all are welcome.

Edited Oct 29, 2019 by Mark Poscablo

WIP: Add a primitive that flattens multidimensional ndarrays

Merge request reports