Pathological case demonstrating massive slowdown
The original file,
pdfgrep only 7 seconds to search. I then decompressed and recompressed the file to produce
after.pdf. On this new file,
pdfgrep now takes 80 seconds to search it. I also tested this procedure against some ebooks and found much worse results, such as an increase from 4s to 250s.
It looks like this might be poppler related, since timing
pdftotext on the files also exhibits a 10x difference in performance. But every other pdf viewer (Mac OS X Preview and Skim, mupdf, PDF.js) and parser (mutool, podofo, pdf-parser.py, pstotext/ghostscript) I tried doesn't exhibit any significant performance difference between these two files.