Performance of BUSCO v4 on draft genomes
What is your experience on running BUSCO v4 on draft genomes (assembled with only Oxford Nanopore data available). Obviously we expect quite some indels/wrong basecalls with this technology (Oxford Nanopore) distributed more ore less evenly over the entire assembly.
I got the following result for a draft genome assembly based on ONT for a fish (run on lineage actinopterygii
):
C:45.9%[S:44.1%,D:1.8%],F:8.9%,M:45.2%,n:3640
1671 Complete BUSCOs (C)
1605 Complete and single-copy BUSCOs (S)
66 Complete and duplicated BUSCOs (D)
323 Fragmented BUSCOs (F)
1646 Missing BUSCOs (M)
3640 Total BUSCO groups searched
My questions:
- How many SCOs do you expect to be found? I know this is a very vague question. What would be your thoughts on this? How much of a problem are frameshifts caused by indels?
- Would you even run BUSCO on such an assembly or is there a better alternative in such a case (to asses completeness of assembly)?
- Do you have suggestions on an alternative parametrization?
Edited by Michael Schmid