Skip to content

Resolve "Parser debugging and initial improvement"

This merge request is for the first part of the issue i.e. parser debugging which involves cleaning and debugging of the parser code, including loading and post-processing steps that were not getting applied incorrectly.

Testing notes:

  • Run make example first in the master branch and then observe the formulas with segmentation errors (same symbol appearing repeatedly or split into multiple symbols in the output rendered mml, e.g., for equals (=)) in the HTML.
    • Examples: A00-3007-P4-R2, K15-1002-P4-R12, K15-1002-P4-R23, K15-1002-P4-R26, K15-1002-P4-R30
  • Switch to 24-parser-debugging-and-initial-improvement branch and run make clean-out.
  • Run make example again on the 24-parser-debugging-and-initial-improvement branch. Observe the formulas (that you observed earlier in master branch) with segmentation errors. Verify that the segmentation errors are now gone.

    Note that you may notice other segmentation errors in eg. K15-1002-P4-R10, K15-1002-P4-R15, K15-1002-P4-R16, etc. However, this occurs because the model being used was trained on higher dpi (600) images but we have modified the pipeline to use 256 dpi images for parsing now. This error was not seen on the master branch since each symbols were explicitly treated as non-merge relationships, and network outputs were not used.

Closes #24

Edited by Ayush Kumar Shah

Merge request reports