Skip to content

Manage tesseract tessdata sources and languages

Currently both tessdata directory and language are hardcoded to wherever Debian places tesseract language data and English respectively. This is not optimal as other environments may have tessdata elsewhere.

This MR implements functionality to pick available tessdata sources. Additionally, since each tessdata directory may contain a different set of languages, the information about available languages is made available to the rest of the code as well. Finally, the language selection is implemented in both cli and gui apps.

The tessdata directories are picked up from the following sources: <tesseract_bin_dir>/../share, TESSDATA_PREFIX environment variable, /usr/share/tesseract-ocr/4.00/tessdata/.

Merge request reports