Resolve "Fit Regions Around Character CCs"
Closes #37 (closed)
This MR introduces a slightly slower (1-2s), but more robust method for finding CCs that intersect detected regions. The intent is to avoid symbols from being cut cut 'in half' near region boundaries.
Simplifying testing to addressing math ACL examples first. Can check for chemistry at a later time.
Testing
git clone https://gitlab.com/dprl/MathSeer-extraction-pipeline.git
cd MathSeer-extraction-pipeline; make; cd modules/protables; git checkout region-intersect
cd ../..
git checkout 37-detection-chemical-regions-not-fit-around-character-ccs
make; make example
- If you are running this locally, skip to step 8. Otherwise, from your local machine, issue the following, making changes where indicated for your use id and the remote machine:
scp -r <your-id>@<remote-machine-address>:~/<path-to-'MathSeer-extraction-pipeline/outputs/ACL/view'> ./ACL-test-view
- Open
./ACL-test-view/index.html
using the Chrome browser - Navigate in Chrome to the pages for document
A00-3007
.
- Page 1 should have no detected formulas.
- On page 2, the first detected formula should be a
w^2
where the 2 and w are touching. Confirm that the ScanSSD detection box cuts through these touching characters, and that they have been merged into one red box (connected component) in the QD-GGA Input (TSV) field below the ScanSSD detection box for the first detected formula results. - Spot check the remaining pages of this document for the ScanSSD and QD-GGA Input (TSV) results (checking that they are sane)
Once this MR is approved, the corresponding MR in the protables repository should also be accepted: protables!9 (merged)
Edited by R