Make denormalization process recursive and also keep tables which are not joined in denormalize primitive.
The primitive now has two additional features:
-
If hyperparameter recursive is set to True, it will join tables recursively. For example, if table 1(main table) has a foreign key that points to table 2, and table 2 has as foreign key that points to table 3, then after table 2 is jointed into table 1, table 1 will have a key that points to table 3. Then the process continues.
-
It will keep those tables which are not joined in the denormalization process.
Now for the implementation:
-
To avoid repetition of the code for the join process, I move the code for the join process in a separate function
_denormalize
. In this way, in either way hyperparameter recursive is set, we could just call function_denormalize
instead of repeating it in the code. -
The function
_prepare_metadata
is created because when we are doing recursive denormalization, for each round of denormalize, it will start from a new metadata and adds the column metadata during the denormalize process. -
In the primitive, if a table other than the main resource is not joined, then we will keep this table in the output. So the key point is that if this table contains a foreign key that points to a joined table, then we will move the pointer of this foreign key to the main resource.
TODO:
- In
dataset_to_dataframe
primitive, it currently only allows the input dataset to compute one table. Also, the circle-ci test for the current merge request fails because in the output there might be multiple tables. There are two solutions in mind:- Change the
dataset_to_dataframe
primitive to allow input dataset to have multiple tables - Make whether we keep the tables that are not joined a hyperparameter of the primitive.
- Change the
- Consider the case where there are loops in foreign keys.