Skip to content

ocr: Use custom thresholding implementation

The default Tesseract thresholder is not adaptive and thus is poor for images with changing background. There exists an adaptive algorithm (Sauvola), but it suffers from either being unable to ignore small relatively abrupt changes of luminosity (e.g. highlighting marker), or eroding characters so much that quality of subsequent image is compromised.

Using a simple adaptive mean thresholding appears to work in both cases.

Merge request reports