4.1.0 glires_db run incorrect full_table.tsv and missing_busco_list.tsv results
I ran with glires_odb10 lineage on 2 different genomes with similar problems.
short_summary.txt below looks good but don't know if it can be trusted given full_table.tsv and missing_busco_list.tsv contents (would like to think so).
Missing BUSCOS say 233 in short_summary.txt but there are 3,786 in missing_busco_list.tsv file.
Also BUSCOs shown as Complete are also shown as Missing in full_table.tsv and there are also 3,786 of them.
Here's some of the data
busco_4.1.0_Nfusc_glires/run_glires_odb10$ \
> cat short_summary.txt
# BUSCO version is: 4.1.0
# The lineage dataset is: glires_odb10 (Creation date: 2019-12-17, number of species: 24, number of BUSCOs: 13798)
# Summarized benchmarking in BUSCO notation for file Nmacr_curated_v2.fa
# BUSCO was run in mode: genome
***** Results: *****
C:97.5%[S:94.0%,D:3.5%],F:0.8%,M:1.7%,n:13798
13458 Complete BUSCOs (C)
12977 Complete and single-copy BUSCOs (S)
481 Complete and duplicated BUSCOs (D)
107 Fragmented BUSCOs (F)
233 Missing BUSCOs (M)
13798 Total BUSCO groups searched
busco_4.1.0_Nfusc_glires/run_glires_odb10$ \
> grep -v "^#" missing_busco_list.tsv |wc -l
3786
busco_4.1.0_Nfusc_glires/run_glires_odb10$ \
> awk 'BEGIN{FS="\t"}/^#/{next}{ar[$2]++}END{for(v in ar)print v,ar[v]}' full_table.tsv
Missing 3786
Complete 12977
Duplicated 1021
Fragmented 107
# you can see for example 21at314147 23at314147 51at314147 listed as both Complete and Missing
busco_4.1.0_Nfusc_glires/run_glires_odb10$ \
> head -50 full_table.tsv
# BUSCO version is: 4.1.0
# The lineage dataset is: glires_odb10 (Creation date: 2019-12-17, number of species: 24, number of BUSCOs: 13798)
# Busco id Status Sequence Gene Start Gene End Score Length OrthoDB url Description
0at314147 Complete tig00011670 444341 642513 15142.9 7048 https://www.orthodb.org/v10?query=0at314147 nebulin
1at314147 Fragmented tig00008818 3823427 4098695 63166.4 32243 https://www.orthodb.org/v10?query=1at314147 titin
3at314147 Complete tig00000025 33989839 34242330 9433.3 6296 https://www.orthodb.org/v10?query=3at314147 dystonin isoform X1
9at314147 Complete tig00002106 18976622 19111962 13310.0 7001 https://www.orthodb.org/v10?query=9at314147 obscurin
10at314147 Complete tig00001094 7225616 7445013 12431.9 6238 https://www.orthodb.org/v10?query=10at314147 microtubule-actin cross-linking factor 1 isoform X1
14at314147 Complete tig00004200 7836213 8098006 10047.3 5348 https://www.orthodb.org/v10?query=14at314147 G-protein coupled receptor 98
18at314147 Complete tig00010535 1997370 2066543 7826.3 5269 https://www.orthodb.org/v10?query=18at314147 fibrous sheath-interacting protein 2
21at314147 Complete tig00002853 3788171 4061372 9825.0 5775 https://www.orthodb.org/v10?query=21at314147 spectrin repeat containing, nuclear envelope 2
21at314147 Missing
22at314147 Complete tig00003909 2182065 2328876 9424.0 4852 https://www.orthodb.org/v10?query=22at314147 midasin
23at314147 Complete tig00001713 17912199 18154780 4308.9 2340 https://www.orthodb.org/v10?query=23at314147 usherin
23at314147 Missing
27at314147 Complete tig00001625 16929479 17067695 9262.5 4511 https://www.orthodb.org/v10?query=27at314147 protocadherin Fat 4
33at314147 Complete tig00000948 29489132 29866606 10845.3 4645 https://www.orthodb.org/v10?query=33at314147 ryanodine receptor 2
37at314147 Complete tig00006824 2703283 2785080 10748.3 4430 https://www.orthodb.org/v10?query=37at314147 prolow-density lipoprotein receptor-related protein 1
38at314147 Complete tig00005932 11858282 11974679 11460.4 4910 https://www.orthodb.org/v10?query=38at314147 E3 ubiquitin-protein ligase UBR4
42at314147 Complete tig00007600 5231431 5368763 8486.6 4220 https://www.orthodb.org/v10?query=42at314147 low-density lipoprotein receptor-related protein 2
44at314147 Complete tig00001255 17757600 17879845 7373.9 5081 https://www.orthodb.org/v10?query=44at314147 zonadhesin
45at314147 Complete tig00009195 5506682 5575249 9991.5 4309 https://www.orthodb.org/v10?query=45at314147 cytoplasmic dynein 1 heavy chain 1
47at314147 Complete tig00001625 18885118 19064729 9892.9 4471 https://www.orthodb.org/v10?query=47at314147 uncharacterized protein KIAA1109 homolog
49at314147 Complete tig00040058 6838426 7186469 8844.9 4451 https://www.orthodb.org/v10?query=49at314147 hydrocephalus-inducing protein homolog
51at314147 Complete tig00003602 1962470 2271850 8767.2 3992 https://www.orthodb.org/v10?query=51at314147 ryanodine receptor 3
51at314147 Missing
56at314147 Complete tig00000804 25397609 25438073 8668.1 4997 https://www.orthodb.org/v10?query=56at314147 sacsin isoform X1
57at314147 Complete tig00040180 3386869 3561539 10117.8 4438 https://www.orthodb.org/v10?query=57at314147 E3 ubiquitin-protein ligase HERC2
59at314147 Complete tig00002728 1709006 1824865 8456.6 4016 https://www.orthodb.org/v10?query=59at314147 dynein heavy chain 10, axonemal
60at314147 Complete tig00000969 3645671 3766026 6635.5 3062 https://www.orthodb.org/v10?query=60at314147 protocadherin Fat 3
62at314147 Complete tig00009790 1110940 1237528 8787.1 4024 https://www.orthodb.org/v10?query=62at314147 ryanodine receptor 1
68at314147 Complete tig00000789 11025431 11278357 9928.0 4303 https://www.orthodb.org/v10?query=68at314147 E3 ubiquitin-protein ligase MYCBP2 isoform X1
...
# here's another way to look at results
~/neotoma/canu_Nmarc/busco_4.1.0_Nfusc_glires/run_glires_odb10$ \
> awk '/^#/{next}{ar[$1]=ar[$1]" "$2}END{for(v in ar){print v ar[v]}}' full_table.tsv |sort -V | head -100
0at314147 Complete
1at314147 Fragmented
3at314147 Complete
9at314147 Complete
10at314147 Complete
14at314147 Complete
18at314147 Complete
21at314147 Complete Missing
22at314147 Complete
23at314147 Complete Missing
27at314147 Complete
33at314147 Complete
37at314147 Complete
38at314147 Complete
42at314147 Complete
44at314147 Complete
45at314147 Complete
47at314147 Complete
49at314147 Complete
51at314147 Complete Missing
56at314147 Complete
57at314147 Complete
59at314147 Complete
60at314147 Complete
62at314147 Complete
68at314147 Complete
69at314147 Complete
71at314147 Complete Missing
77at314147 Fragmented
83at314147 Complete Missing
85at314147 Complete
91at314147 Complete Missing
95at314147 Complete
103at314147 Complete Missing
104at314147 Complete
106at314147 Complete
107at314147 Complete
108at314147 Complete
116at314147 Complete
118at314147 Complete
119at314147 Complete
121at314147 Complete Missing
...