Feature Request - Support parsing documents for metadata extraction
I'm thinking about digitizing my whole documents into Mayan EDMS. To do this effectively, I'd like to automatically parse standard documents like invoices to extract metadata like invoice #, invoice date, supplier etc. to each document and present me the results to optionally correct the extracted data.
The process would be done in a workflow applied to the document which triggers something like invoice2data to extract informations based on document templates and store them as metadata to the document. (or even use something sophisticated like machine learning with object detection and contextual OCR extraction)
After the metadata is extracted, the document could be mailed to me with the extracted metadata and I can optionally correct it using the UI.
So this FR is to include something like the mentioned invoice2data to get started.