Support for out of memory joins/streaming joins
In order to do truly large scale joins, this package should have intuitive support for out of memory joins capable of joining with an iterator.
Perhaps this could be accomplished but simply having evaluation if the data
element passed to the class is of type <class 'pandas.io.parsers.TextFileReader'>
or whatever
The tricky bit algorithmically lies in what to do if both sides of a join are iterators. This is for at least two reasons: 1) how do you reliably produce a vectorizer, 2) how would you reliably ensure that you've done all possible combos of comparisons (i.e., m
x n
where m
= number of chunks in side A and n
= number of chunks in side B)
Edited by John Stevenson