feature idea: Use "textract" lib to extract text from documents
Hi, I found this lib: http://textract.readthedocs.org/en/latest/ May be it is useful for mayan.
It is mentioned there: http://pyvideo.org/video/3526/cleaning-confused-collections-of-characters BTW.: it it worth the time to watch this :-)
br Matthias
Edited by Roberto Rosario