Evaluation code ICDAR/IJDAR
From the directory graphics_recognition
git pull; git checkout containerize
make rebuild && make chem-v2-all-test
Check SMILES
cp outputs/All/generated_smiles/or100.09.tables/smiles_out.txt ./smiles_test_NEW.txt
git checkout evaluation
make chem-v2-all-test
diff outputs/All/generated_smiles/or100.09.tables/smiles_out.txt ./smiles_test_NEW.txt
There must not be any difference.
Check CDXML
- Check the file
outputs/All/generated_cdxmls/or100.09.tables_full_cdxml/or100.09.tables_allpages.cdxml
in ChemDraw, checking structures against the standard 24-page test file
Check ICDAR/IJDAR runs
make download-synthetic-chem-data
make download-real-chem-data
make chem-ijdar-USPTO-indigo
make chem-ijdar-UOB
make chem-ijdar-CLEF
NOTE: The stack trace of errors in some files is expected to be shown in the terminal (don’t panic).
Output for USPTO (Indigo) dataset:
count: 5704
Total molecules (files): 5719
Tanimoto 0.994365411944743
Lev 1.018359853121175
Lev normalized 0.9921031031581111
Exact matches: 5599
Incorrectly parsed: 105
Fatal errors (files failed): 15
Memory usage: 200404 KB
Output for UOB dataset:
count: 5740
Total molecules (files): 5740
Tanimoto 0.9626597233095426
Lev 0.6939024390243902
Lev normalized 0.9733236108299292
Exact matches: 5479
Incorrectly parsed: 261
Fatal errors (files failed): 0
Memory usage: 197888 KB
Output for CLEF dataset:
count: 921
Total molecules (files): 992
Tanimoto 0.9178132013139747
Lev 2.310483870967742
Lev normalized 0.8951772239760818
Exact matches: 836
Incorrectly parsed: 85
Fatal errors (files failed): 71
Memory usage: 186784 KB
Edited by Bryan Amador