Skip to content

Resolve "Fit Regions Around Character CCs"

Closes #37 (closed)

This MR introduces a slightly slower (1-2s), but more robust method for finding CCs that intersect detected regions. The intent is to avoid symbols from being cut cut 'in half' near region boundaries.

Simplifying testing to addressing math ACL examples first. Can check for chemistry at a later time.

Testing

  1. git clone https://gitlab.com/dprl/MathSeer-extraction-pipeline.git
  2. cd MathSeer-extraction-pipeline; make; cd modules/protables; git checkout region-intersect
  3. cd ../..
  4. git checkout 37-detection-chemical-regions-not-fit-around-character-ccs
  5. make; make example
  6. If you are running this locally, skip to step 8. Otherwise, from your local machine, issue the following, making changes where indicated for your use id and the remote machine:
  • scp -r <your-id>@<remote-machine-address>:~/<path-to-'MathSeer-extraction-pipeline/outputs/ACL/view'> ./ACL-test-view
  1. Open ./ACL-test-view/index.html using the Chrome browser
  2. Navigate in Chrome to the pages for document A00-3007.
  • Page 1 should have no detected formulas.
  • On page 2, the first detected formula should be a w^2 where the 2 and w are touching. Confirm that the ScanSSD detection box cuts through these touching characters, and that they have been merged into one red box (connected component) in the QD-GGA Input (TSV) field below the ScanSSD detection box for the first detected formula results.
  • Spot check the remaining pages of this document for the ScanSSD and QD-GGA Input (TSV) results (checking that they are sane)

Once this MR is approved, the corresponding MR in the protables repository should also be accepted: protables!9 (merged)

Edited by R

Merge request reports