CDXML Generation Corrections
This MR includes the following improvements to ChemDraw output files:
Clean-up of CDXML files and formatting changes
- Removal of unnecessary attributes on tags in CDXML files (NeedsClean,Justification,etc.)
- Formatting improvements (new font (Times New Roman); bond lines thinner; line ends closer to text labels)
- Made brackets for markush structures shorter
-
CreationProgram
attribute at file top is now "ChemScraper v0.1" - Minor code re-organization and commenting
Corrections
- Font sizes now estimated directly from character sizes on the page for labels
- Corrected alignment of atoms with double-bonds (atom
p
ositions now at center point) - Corrected error with strings defined left-right on left side of a bond (e.g., "TsO" no longer appears as "OsT" with an error in ChemDraw)
Testing
- Checkout the changes, using
git pull; git checkout label_cdxml_files
- From the
graphics_extraction
directory, issue:make chem-v2-all-test
- Save a copy of the SMILES file here outside the repo (e.g., in your home directory):
outputs/All/generated_smiles/or100.09.tables/smiles_out.txt
- Checkout the target branch, using
git checkout containerize
- Repeat steps 2 and 3.
- Run
diff
on the two versions ofsmiles_out.txt
-- there should be no difference. - Download some of the generate CDXML files to check the output.
- If steps 6 and 7 pass, approve the merge.