Ability to create as input a list of arguments
The use case is for glue primitives like dataframe concatenate. Currently, the concatenate primitive takes two dataframes as arguments. To concatenate n dataframes would take n-1 calls. This is inconvenient and inefficient for large dataframes (linear versus quadratic). It would be better if the concatenate primitive can take a list of dataframes, e.g.
concat_primitive.produce(inputs=[dataframe0, dataframe1, dataframe2])
But, the current pipeline schema does not support the creation of lists. The proposal is update the schema to allow list creation, e.g.
concat_step.add_argument(name='inputs', data_reference=['step.0.produce', 'step.1.produce', 'step.2.produce'])
Currently, value for data_reference can only be a single produce, not a list of produces.
The downside of this proposal is the potential difficulty for TA2 systems to automatically determine what should be in the list. One can imagine a TA1 primitive requiring list of unrelated dataframes as input, e.g. inputs=[attribute_dataframe, target_dataframe, instance_weight_dataframe].