Underlying AUGUSTUS predictions - Checking for errors, tweaking run parameters?
Background Info: I am using your BUSCO software for checking genome draft completeness.
I am also using it for another reason - to understand relationships across 66 png2jpg.zippng2jpg.zipFusarium fungal strains. Not all 3725 Sordariomyceta genes were present in all 66 strains due to differences in sequencing depth and draft assembly quality. Across these 66 strains, 2356 were found in common and I individually aligned them using MAFFT, then concatenated to infer ML tree based phylogeny.
Since this was all automated, I looked at the BUSCO genes whose multiple alignments had the highest proportion of parsimony informative sites using AMAS this turned out to be only 37 out of 2356 with that proportion > 40% of aligned sequence length.
For these, I visualized their MSA using JalView and I am a little surprised that several of these have very long gaps. One such example is attached, you can zoom in after download since it is a PNG file.
With that as background, these are my questions:
-
Is there is any way to check for errors in the AUGUSTUS gene predictions for the conserved BUSCOs, because all downstream inferences are based on the accuracy of this initial gene prediction step.
-
Is there any way to customize the AUGUSTUS prediction parameters during the BUSCO analyses? Do you think such customization is ever necessary?