Allow for more kinds of grouping in joins
Currently a join results in a row for each row in the left (= "input" = "original") dataset and only the rows from the right (= "companion") dataset are aggregated.
For example, given those datasets:
date | L |
---|---|
2020-04-01 | 1 |
2020-04-01 | 2 |
2020-04-02 | 3 |
date | R |
---|---|
2020-04-01 | 7 |
2020-04-03 | 8 |
You will get this join:
date | L | R |
---|---|---|
2020-04-01 | 1 | 7 |
2020-04-01 | 2 | 7 |
2020-04-02 | 3 | null |
If you request the temporal aggregation at the monthly resolution, you will get:
date | L | sum R |
---|---|---|
2020-04-01 | 1 | 15 |
2020-04-01 | 2 | 15 |
2020-04-02 | 3 | null |
(the R values are aggregated for the whole month, and repeated for each row from the left)
This is what needs to happen for machine learning augmentationL: original rows are what you are trying to classify, and they should not be altered, dropped, or combined. However outside of that use-case this is probably unexpected, and the user would probably want aggregation to happen on the whole table.
This should probably be controlled through a parameter and settable via the UI.