Very slow blastn on long, but not on short, sequences
Hello,
I am using BUSCO v4.1.2 (installed via conda). I used the following command to analyze a contigs-level assembly:
busco -i contigs.fasta -o BUSCO_contigs -m genome -c 40 -l poales_odb10 --limit 3
Since the assembly size is ~4.6 Gbp, the run takes about 12 hours, out of which blastn takes about 2 hours.
My assembly is rather fragmented (~5 kb N50), and I know that this could impair BUSCO results, so I ran a scaffolding procedure based on a reference genome (using RagTag), which resulted in pseudomolecules. I ran the same busco command on them:
busco -i pseudomolecules.fasta -o BUSCO_pseudomolecules -m genome -c 40 -l poales_odb10 --limit 3
but this is taking very long. Specifically, blastn is very slow - after ~6 hours it only completed 150 queries!
This is a bit confusing to me since I'm working on the same contigs, and they are just combined into larger sequences.
Does this make sense? Any suggestions on how to improve performance? Currently it seems quite impractical to use BUSCO on this genome.
Thanks!