hwscan: page-is-only-a-jpeg-image logic is wrong
I have a file that proves our detect of jpeg extraction is naive.
- handwritten on a tablet
- page contains a single jpeg figure
- page contains no text (but many handwritten strokes)
Because of this, Plom thinks* it should take this single jpeg image as the page image. This is incorrect.
More info:
-
pdfinfo
saysProducer: Microsoft: Print To PDF
. -
In fact, in the particular case I have, we are saved by the minimum 800x600 size check.
@arechnitzer I cannot post tihs file but its pg 3 of a file starting with chars du
in the Canvas example set. There is a diamond-shaped figure A<|>B
on the page.