Feature request: metadata from filename
In the same way that exiftool is used as file metadata app to extract a number of properties from a file, I would like to have a filename metadata app that parses the filename itself for metadata.
This would solve the problem of getting metadata from files that cannot have metadata, e.g. .txt files. It would also allow for a custom scan-tag-and-upload script or for more elaborate bulk importing scenarios.
I see a couple of options here.
- Filename is a dictionary.
Unless I'm forgetting something, there's nothing preventing a user from using a filename like:
{'Property 1':'Value 1'},{'Property 2':'Value 2'}.pdf
You could even have {'FileName':''} here or add multi-level properties.
A question is how property name clashes should be handled. In the example above, exiftool would also produce the FileName property. Maybe imposing a fixed order the filters are applied and not overwriting existing properties (first one sticks) would be sufficient.
- Metadata type extension
In this case, each metadata type has also a field expression (regex) allowing the extraction of part of the file name as its value (and a default one in case the parsing fails). The field expression could be scanf-like: '%s,' or '%*s,%s,%s' (skip 2 first values) but I bet that Python has much better ones. Or give a specification for all fields and create the metadata from those (numbered) elements. For example, field spec is "A{:04}{}.pdf" and selecting field 2 would give 'foo' in case the filename is 'A0344foo.pdf'.
This particular extension would give Mayan a similar feature as e.g. Paperless to extract a date, invoice number and title from the uploaded file name.
Tags could also be handled similarly from filenames as a tag type extension.
Notes:
-
Both methods are not mutually exclusive.
-
The big caveat here is that filesystems impose a limit, usually 255, on the filename length. So I would not propose this to be a replacement for exiftool. But 255 characters will already go a long way.
I would like to try and implement something like this, but at least I'd need some high-level guidance since I'm unfamiliar with the Mayan project structure (and I'm also not too familiar with either Django or Python, but I can work on that).
Thanks for considering.