Lower BUSCO score in combined transcriptome vs. known transcriptome

Hi! I am trying to update the existing known genome annotation using new RNA-seq evidence, and I've managed to get a new annotation GTF file using BRAKER. Comparison of the known.gtf and the **new.gtf **using GFFCOMPARE shows that novel exons and introns exist in the new.gtf. I then ran BUSCO on the two GTFs, and on the combined.gtf which is a non-redundant set of transcripts of the new.gtf and the known.gtf (by first converting them to FASTA files using GFFREAD). However, the results show that the known.gtf, rather than the combined.gtf, has MORE complete BUSCO than the other two datasets. Intuitively the combined.gtf should has the highest BUSCO score because it has the information of both the known and new annotations. Why and how can this happen?

BUSCO version 4.1.4, lineage: actinopterygii, mode: transcriptome

BUSCO results:

know.gtf C:89.3%[S:37.7%,D:51.6%],F:2.1%,M:8.6%,n:3640
new.gtf C:83.6%[S:42.6%,D:41.0%],F:5.0%,M:11.4%,n:3640
combined.gtf C:88.4%[S:21.4%,D:67.0%],F:2.6%,M:9.0%,n:3640

Sincere thanks, Peiwen

Admin message

Lower BUSCO score in combined transcriptome vs. known transcriptome