Failure searching for a string belonging to two or more lines in PDF articles with column layout
I am trying to use pdfgrep to search for strings in academic papers that are nearly always produced in two-column layout. If search string contains a phrase that sits on multiple (2+) lines within the same column, pgrep fails to find a match.
My tests suggest that pdfgrep does not differentiate between columns, but rather views document as integral lines spanning the entire width of the page, only with a huge gap in the middle.
A sample PDF containing an open-access article can be obtained here: https://www.nature.com/polopoly_fs/1.12676!/menu/main/topColumns/topLeftColumn/pdf/495426a.pdf.
$ pdfgrep -V This is pdfgrep version 2.0.1. Using poppler version 0.41.0 Using libpcre version 8.38 2015-11-23 $ uname -iporsv Linux 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9 09:19:02 UTC 2017 x86_64 x86_64 GNU/Linux