Lower BUSCO score in combined transcriptome vs. known transcriptome
Hi! I am trying to update the existing known genome annotation using new RNA-seq evidence, and I've managed to get a new annotation GTF file using BRAKER. Comparison of the known.gtf and the **new.gtf **using GFFCOMPARE shows that novel exons and introns exist in the new.gtf. I then ran BUSCO on the two GTFs, and on the combined.gtf which is a non-redundant set of transcripts of the new.gtf and the known.gtf (by first converting them to FASTA files using GFFREAD). However, the results show that the known.gtf, rather than the combined.gtf, has MORE complete BUSCO than the other two datasets. Intuitively the combined.gtf should has the highest BUSCO score because it has the information of both the known and new annotations. Why and how can this happen?
BUSCO version 4.1.4, lineage: actinopterygii, mode: transcriptome
BUSCO results:
- know.gtf C:89.3%[S:37.7%,D:51.6%],F:2.1%,M:8.6%,n:3640
- new.gtf C:83.6%[S:42.6%,D:41.0%],F:5.0%,M:11.4%,n:3640
- combined.gtf C:88.4%[S:21.4%,D:67.0%],F:2.6%,M:9.0%,n:3640
Sincere thanks, Peiwen