Skip to content

Evaluation code ICDAR/IJDAR

Bryan Amador requested to merge evaluation into containerize

From the directory graphics_recognition

  • git pull; git checkout containerize
  • make rebuild && make chem-v2-all-test

Check SMILES

  • cp outputs/All/generated_smiles/or100.09.tables/smiles_out.txt ./smiles_test_NEW.txt
  • git checkout evaluation
  • make chem-v2-all-test
  • diff outputs/All/generated_smiles/or100.09.tables/smiles_out.txt ./smiles_test_NEW.txt

There must not be any difference.

Check CDXML

  • Check the file outputs/All/generated_cdxmls/or100.09.tables_full_cdxml/or100.09.tables_allpages.cdxml in ChemDraw, checking structures against the standard 24-page test file

Check ICDAR/IJDAR runs

  • make download-synthetic-chem-data
  • make download-real-chem-data
  • make chem-ijdar-USPTO-indigo
  • make chem-ijdar-UOB
  • make chem-ijdar-CLEF

NOTE: The stack trace of errors in some files is expected to be shown in the terminal (don’t panic).

Output for USPTO (Indigo) dataset:

count:  5704
Total molecules (files):  5719
Tanimoto 0.994365411944743
Lev 1.018359853121175
Lev normalized 0.9921031031581111
Exact matches: 5599
Incorrectly parsed: 105
Fatal errors (files failed): 15
Memory usage: 200404 KB

Output for UOB dataset:

count:  5740
Total molecules (files):  5740
Tanimoto 0.9626597233095426
Lev 0.6939024390243902
Lev normalized 0.9733236108299292
Exact matches: 5479
Incorrectly parsed: 261
Fatal errors (files failed): 0
Memory usage: 197888 KB 

Output for CLEF dataset:

count:  921
Total molecules (files):  992
Tanimoto 0.9178132013139747
Lev 2.310483870967742
Lev normalized 0.8951772239760818
Exact matches: 836
Incorrectly parsed: 85
Fatal errors (files failed): 71
Memory usage: 186784 KB
Edited by Bryan Amador

Merge request reports