Suggestion: tree-shakeable transformations
Created by: luucvanderzee
While our transformations currently have a pretty decent API, there is one big drawback to the current class-based piping/chaining approach: the transformations are not tree-shakeable. This means that you always send the code of all transformations to the client.
I think I just came up with an approach to circumvent this problem. Say that we want to do this with the current API:
import DataContainer from '@snlab/florence-datacontainer'
const data = new Datacontainer({
fruit: ['apple', 'apple', 'banana', 'banana'],
price: [1, 2, 3, 4]
})
const meanPricePerFruit = data
.groupBy('fruit')
.summarise({ mean_price: { price: 'mean' } })
.arrange({ mean_price: 'descending' })
In the new proposed API, this would become
import DataContainer, { groupBy, summarise, arrange } from '@snlab/florence-datacontainer'
const data = new Datacontainer({
fruit: ['apple', 'apple', 'banana', 'banana'],
price: [1, 2, 3, 4]
})
const meanPricePerFruit = data.pipe(
groupBy('fruit'),
summarise({ mean_price: { price: 'mean' } }),
arrange({ mean_price: 'descending' })
)
The advantages of this method are
- Tree-shakeable transformations, like mentioned above
- Easy for users to write and use custom transformations (see below)
- Transformations can be used without a
DataContainer
(if you are using column-oriented data at least) - Cleaner separation of code/tests can just focus purely on the transformations
An example of how a user could write a custom toQuantitative
transformation to convert categorical
data to quantitative
data:
import DataContainer from '@snlab/florence-datacontainer'
const toQuantitatve = columnName => {
return data => {
const categoricalColumn = data[columnName]
data[columnName] = categoricalColumn.map(value => parseFloat(value))
return data
}
}
const dataContainer = new DataContainer({
amount: [1, 2, 3, 4],
price: ['1', '2', '3', '4']
}).pipe(toQuantitative('price'))
console.log(dataContainer.column('price')) // [1, 2, 3, 4]
Thoughts?