New error while parsing rst file with codeml

In a series of analysis, I have several egglib.wrappers.codeml() calls failing with an error message stating that either NEB or BEB results cannot be extracted. The bug is similar to a previous one reported by Florent Marchal and it is possible that the fix isn't generic enough. Indeed, the rst output file of codeml is quite flexible.

Add `debug` option

The problem is that the analysis is very time consuming. To help debugging (and potential further debugging), I will add an option debug to the function egglib.wrappers.codeml() that will allow saving all output files into an archive. Once this is done I will address the bug itself.

Fix bug

The problem occurred in the rst file when the number of sites (length of the codon alignment) is over 1000. In the table of site posterior probabilities, the parsing code was expecting 1 or more spaces before the positiom but when the number of sites reaches 1000 there is no more spaces so the regular expression failed. At this point the fix is trivial. The expectation is turned to 0 or more space.

Conclusion

No changes are made to the test suite because we won't routinely process 1000-aa alignments with the codeml function. Two datasets that led to the detection of the problem are now correctly processed. An additional run will be performed because releasing the final v3.5.0. It is unclear to me what CodeML will do with 10,000-aa alignments or more but I expect that the outcome will match the regular expression. In any case, the debug option will help solving future problems.

bug egglib.wrappers

Edited May 04, 2025 by Stéphane De Mita

New error while parsing rst file with codeml

Add debug option

Fix bug

Conclusion

Add `debug` option