Resolve "Parser debugging and initial improvement"
This merge request is for the first part of the issue i.e. parser debugging which involves cleaning and debugging of the parser code, including loading and post-processing steps that were not getting applied incorrectly.
Testing notes:
- Run
make example
first in themaster
branch and then observe the formulas with segmentation errors (same symbol appearing repeatedly or split into multiple symbols in the output rendered mml, e.g., for equals (=)) in the HTML.- Examples:
A00-3007-P4-R2, K15-1002-P4-R12, K15-1002-P4-R23, K15-1002-P4-R26, K15-1002-P4-R30
- Examples:
- Switch to
24-parser-debugging-and-initial-improvement
branch and runmake clean-out
. - Run
make example
again on the24-parser-debugging-and-initial-improvement
branch. Observe the formulas (that you observed earlier in master branch) with segmentation errors. Verify that the segmentation errors are now gone.Note that you may notice other segmentation errors in eg.
K15-1002-P4-R10, K15-1002-P4-R15, K15-1002-P4-R16
, etc. However, this occurs because the model being used was trained on higher dpi (600) images but we have modified the pipeline to use 256 dpi images for parsing now. This error was not seen on themaster
branch since each symbols were explicitly treated as non-merge relationships, and network outputs were not used.
Closes #24
Edited by Ayush Kumar Shah