5.7.0 busco does not output the nucleotide sequences anymore, only faa and gff, intentional?
The 5.6.1 release gave me the nucleotide sequences (.fna) for the exons from the genes in the output folders, along with the .faa and .gff feature files. But the 5.7.0, run with the same options, only outputs .faa and .gff and no .fna.
I run it like this:
docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v5.7.0_cv1 busco -m geno -c 24 -l lepidoptera_odb10 -i contigs.fasta
And in the busco_sequences/single_copy_busco_sequences I now don't get the .fna files as previously.
It does seem that the .gff feature file still points to the nucleotide sequence from the fasta input, so I can write a script that parses out the sequences like before, but would be good to clarify if this is the suggested workflow to detect the genes in my input (I'm doing phylogenomics after this so intend to pass the detected sequences into iq-tree eventually), like, the busco_sequences output is considered to be tool-dependent (miniprot vs metaeuk) and not a well-defined output?
For example a .gff file could have the following output, and in that case I can go into my contig NODE_2389 and extract each CDS region and concatenate the sequences to get what was in the .fna file previously? (For the 5.6.1 release I did the same checks, but there the .gff file had "exon" intervals, that if concatenated from the input source, gave the same data as in the .fna file)
NODE_2389_length_8483_cov_3.384368 miniprot mRNA 529 7905 2775 - . ID=MP035653;Rank=1;Identity=0.6859;Positive=0.8113;Target=550at7088_2 657 1353
NODE_2389_length_8483_cov_3.384368 miniprot CDS 7710 7905 352 - 0 Parent=MP035653;Rank=1;Identity=0.8788;Target=550at7088_2 657 721
NODE_2389_length_8483_cov_3.384368 miniprot CDS 6963 7131 271 - 2 Parent=MP035653;Rank=1;Identity=0.8036;Target=550at7088_2 722 777
NODE_2389_length_8483_cov_3.384368 miniprot CDS 6430 6577 223 - 1 Parent=MP035653;Rank=1;Identity=0.6939;Target=550at7088_2 778 827
NODE_2389_length_8483_cov_3.384368 miniprot CDS 6030 6236 304 - 0 Parent=MP035653;Rank=1;Identity=0.6957;Target=550at7088_2 828 896
NODE_2389_length_8483_cov_3.384368 miniprot CDS 5448 5634 276 - 0 Parent=MP035653;Rank=1;Identity=0.7143;Target=550at7088_2 897 958
NODE_2389_length_8483_cov_3.384368 miniprot CDS 4893 5069 258 - 2 Parent=MP035653;Rank=1;Identity=0.7119;Target=550at7088_2 959 1017
NODE_2389_length_8483_cov_3.384368 miniprot CDS 3023 3151 202 - 2 Parent=MP035653;Rank=1;Identity=0.8605;Target=550at7088_2 1018 1060
NODE_2389_length_8483_cov_3.384368 miniprot CDS 2655 2795 222 - 2 Parent=MP035653;Rank=1;Identity=0.8085;Target=550at7088_2 1061 1107
NODE_2389_length_8483_cov_3.384368 miniprot CDS 2054 2329 164 - 2 Parent=MP035653;Rank=1;Identity=0.4362;Target=550at7088_2 1108 1192
NODE_2389_length_8483_cov_3.384368 miniprot CDS 1558 1701 145 - 2 Parent=MP035653;Rank=1;Identity=0.5833;Target=550at7088_2 1193 1236
NODE_2389_length_8483_cov_3.384368 miniprot CDS 958 1172 199 - 2 Parent=MP035653;Rank=1;Identity=0.5493;Target=550at7088_2 1237 1308
NODE_2389_length_8483_cov_3.384368 miniprot CDS 529 663 159 - 0 Parent=MP035653;Rank=1;Identity=0.7111;Target=550at7088_2 1309 1353