Commit 16ea619a authored by Matthew Berkeley's avatar Matthew Berkeley

BUSCO 4.0.4

parent 1bd7f8e3
4.0.4
- Fix inefficiency introduced in 4.0.3
4.0.3
- Issue #190 fixed
- Issue #191 fixed
- Issue #196 fixed
- Issue #200 fixed
- Reintroduce full retraining for all eukaryote runs
- Fix retraining bug
4.0.2
- Issue #182 partially fixed
......
......@@ -29,9 +29,6 @@ To get help on BUSCO use: ``busco -h`` and ``python3 scripts/generate_plot.py -h
**!!!** Don't use "odb9" datasets with BUSCOv4. If you need to reproduce previous analyses, use BUSCOv3 (https://gitlab.com/ezlab/busco/-/tags/3.0.2)
Note: While preparing the release of v4.0.3, we found and fixed a bug in the genome mode for the eukaryote pipeline. We recommend repeating
any affected runs done using previous 4.x versions with the updated version of the software.
Note: For v4.0.2 and before, when running auto-lineage, the initial results for eukaryotes were incomplete. This was
deliberate, as these initial results are used merely to determine whether the genome scores highest against the
bacteria, archaea or eukaryota datasets. If the eukaryota dataset was selected, BUSCO then attempts to place the input
......
......@@ -195,7 +195,9 @@ class AutoSelectLineage:
if self.selected_runner.domain == "prokaryota":
protein_seqs = self.selected_runner.analysis.prodigal_runner.output_faa
elif self.selected_runner.domain == "eukaryota":
protein_seqs = self.selected_runner.analysis.augustus_runner.output_sequences
protein_seqs_dir = self.selected_runner.analysis.augustus_runner.extracted_prot_dir
protein_seqs = [os.path.join(protein_seqs_dir, f) for f in os.listdir(protein_seqs_dir)
if f.split(".")[-2] == "faa"]
else:
protein_seqs = self.selected_runner.config.get("busco_run", "in")
out_path = self.config.get("busco_run", "main_out")
......
......@@ -389,7 +389,7 @@ class BuscoAnalysis(metaclass=ABCMeta):
self.hmmer_runner.load_buscos()
self.hmmer_runner.run()
self.hmmer_runner.process_output()
self.all_single_copy_buscos.update(self.hmmer_runner.single_copy_buscos)
# self.all_single_copy_buscos.update(self.hmmer_runner.single_copy_buscos)
self._write_hmmer_results()
self._produce_hmmer_summary()
return
......
......@@ -1304,9 +1304,6 @@ class AugustusRunner:
for filename in files:
self._extract_genes_from_augustus_output(filename)
self.output_sequences = [os.path.join(self.extracted_prot_dir, f) for f in
os.listdir(self.extracted_prot_dir) if f.split(".")[-2] == "faa"]
if not self.any_gene_found:
raise NoGenesError("Augustus")
......@@ -1470,7 +1467,7 @@ class AugustusRunner:
output_fna = os.path.join(self.extracted_prot_dir, filename.replace("out", "fna"))
output_faa = os.path.join(self.extracted_prot_dir, filename.replace("out", "faa"))
# self.output_sequences.append(output_faa)
self.output_sequences.append(output_faa)
with open(output_fna, "w") as out_fna:
SeqIO.write(sequences_nt, out_fna, "fasta")
......
......@@ -6,4 +6,4 @@ Copyright (c) 2016-2020, Evgeny Zdobnov ([email protected])
Licensed under the MIT license. See LICENSE.md file.
"""
__version__ = "4.0.3"
__version__ = "4.0.4"
INFO: ***** Start a BUSCO v4.0.3 analysis, current time: 02/11/2020 14:22:28 *****
INFO: ***** Start a BUSCO v4.0.4 analysis, current time: 02/12/2020 16:38:00 *****
INFO: Configuring BUSCO with /busco/config/config.ini
INFO: Mode is genome
INFO: Input file is genome.fna
......@@ -24,6 +24,7 @@ INFO: [hmmsearch] 20 of 194 task(s) completed
INFO: [hmmsearch] 39 of 194 task(s) completed
INFO: [hmmsearch] 59 of 194 task(s) completed
INFO: [hmmsearch] 78 of 194 task(s) completed
INFO: [hmmsearch] 97 of 194 task(s) completed
INFO: [hmmsearch] 117 of 194 task(s) completed
INFO: [hmmsearch] 136 of 194 task(s) completed
INFO: [hmmsearch] 156 of 194 task(s) completed
......@@ -41,8 +42,9 @@ INFO: Running 124 job(s) on hmmsearch
INFO: [hmmsearch] 13 of 124 task(s) completed
INFO: [hmmsearch] 25 of 124 task(s) completed
INFO: [hmmsearch] 38 of 124 task(s) completed
INFO: [hmmsearch] 50 of 124 task(s) completed
INFO: [hmmsearch] 63 of 124 task(s) completed
INFO: [hmmsearch] 75 of 124 task(s) completed
INFO: [hmmsearch] 87 of 124 task(s) completed
INFO: [hmmsearch] 100 of 124 task(s) completed
INFO: [hmmsearch] 112 of 124 task(s) completed
INFO: [hmmsearch] 124 of 124 task(s) completed
......@@ -106,10 +108,9 @@ INFO: [augustus] 13 of 14 task(s) completed
INFO: [augustus] 14 of 14 task(s) completed
INFO: Extracting predicted proteins...
INFO: ***** Run HMMER on gene sequences *****
INFO: [hmmsearch] 1 of 4 task(s) completed
INFO: [hmmsearch] 2 of 4 task(s) completed
INFO: [hmmsearch] 3 of 4 task(s) completed
INFO: [hmmsearch] 4 of 4 task(s) completed
INFO: [hmmsearch] 1 of 3 task(s) completed
INFO: [hmmsearch] 2 of 3 task(s) completed
INFO: [hmmsearch] 3 of 3 task(s) completed
WARNING: BUSCO did not find any match. Make sure to check the log files if this is unexpected.
INFO: Results: C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:255
......@@ -147,7 +148,7 @@ INFO:
|97 Missing BUSCOs (M) |
|124 Total BUSCO groups searched |
--------------------------------------------------
INFO: BUSCO analysis done with WARNING(s). Total running time: 80 seconds
INFO: BUSCO analysis done with WARNING(s). Total running time: 81 seconds
***** Summary of warnings: *****
WARNING:busco.ConfigManager Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
......
INFO: ***** Start a BUSCO v4.0.3 analysis, current time: 02/11/2020 10:15:18 *****
INFO: ***** Start a BUSCO v4.0.4 analysis, current time: 02/12/2020 16:40:52 *****
INFO: Configuring BUSCO with /busco/config/config.ini
INFO: Mode is genome
INFO: Input file is genome.fna
......@@ -27,7 +27,6 @@ INFO: [hmmsearch] 78 of 194 task(s) completed
INFO: [hmmsearch] 97 of 194 task(s) completed
INFO: [hmmsearch] 117 of 194 task(s) completed
INFO: [hmmsearch] 136 of 194 task(s) completed
INFO: [hmmsearch] 156 of 194 task(s) completed
INFO: [hmmsearch] 175 of 194 task(s) completed
INFO: [hmmsearch] 194 of 194 task(s) completed
INFO: Results: C:1.0%[S:1.0%,D:0.0%],F:0.5%,M:98.5%,n:194
......@@ -39,6 +38,7 @@ INFO: ***** Run Prodigal on input to predict and extract genes *****
INFO: Genetic code 11 selected as optimal
INFO: ***** Run HMMER on gene sequences *****
INFO: Running 124 job(s) on hmmsearch
INFO: [hmmsearch] 13 of 124 task(s) completed
INFO: [hmmsearch] 25 of 124 task(s) completed
INFO: [hmmsearch] 38 of 124 task(s) completed
INFO: [hmmsearch] 50 of 124 task(s) completed
......@@ -125,15 +125,16 @@ INFO: [augustus] 36 of 39 task(s) completed
INFO: [augustus] 39 of 39 task(s) completed
INFO: Extracting predicted proteins...
INFO: ***** Run HMMER on gene sequences *****
INFO: [hmmsearch] 8 of 76 task(s) completed
INFO: [hmmsearch] 16 of 76 task(s) completed
INFO: [hmmsearch] 23 of 76 task(s) completed
INFO: [hmmsearch] 31 of 76 task(s) completed
INFO: [hmmsearch] 38 of 76 task(s) completed
INFO: [hmmsearch] 54 of 76 task(s) completed
INFO: [hmmsearch] 61 of 76 task(s) completed
INFO: [hmmsearch] 69 of 76 task(s) completed
INFO: [hmmsearch] 76 of 76 task(s) completed
INFO: [hmmsearch] 4 of 37 task(s) completed
INFO: [hmmsearch] 8 of 37 task(s) completed
INFO: [hmmsearch] 12 of 37 task(s) completed
INFO: [hmmsearch] 15 of 37 task(s) completed
INFO: [hmmsearch] 19 of 37 task(s) completed
INFO: [hmmsearch] 23 of 37 task(s) completed
INFO: [hmmsearch] 26 of 37 task(s) completed
INFO: [hmmsearch] 30 of 37 task(s) completed
INFO: [hmmsearch] 34 of 37 task(s) completed
INFO: [hmmsearch] 37 of 37 task(s) completed
INFO: Results: C:18.8%[S:18.8%,D:0.0%],F:0.4%,M:80.8%,n:255
INFO: eukaryota_odb10 selected
......@@ -197,6 +198,7 @@ INFO: Training Augustus using Single-Copy Complete BUSCOs:
INFO: Converting predicted genes to short genbank files
INFO: Running 29 job(s) on gff2gbSmallDNA.pl
INFO: [gff2gbSmallDNA.pl] 3 of 29 task(s) completed
INFO: [gff2gbSmallDNA.pl] 6 of 29 task(s) completed
INFO: [gff2gbSmallDNA.pl] 9 of 29 task(s) completed
INFO: [gff2gbSmallDNA.pl] 12 of 29 task(s) completed
INFO: [gff2gbSmallDNA.pl] 15 of 29 task(s) completed
......@@ -225,16 +227,14 @@ INFO: [augustus] 133 of 147 task(s) completed
INFO: [augustus] 147 of 147 task(s) completed
INFO: Extracting predicted proteins...
INFO: ***** Run HMMER on gene sequences *****
INFO: [hmmsearch] 17 of 169 task(s) completed
INFO: [hmmsearch] 34 of 169 task(s) completed
INFO: [hmmsearch] 51 of 169 task(s) completed
INFO: [hmmsearch] 68 of 169 task(s) completed
INFO: [hmmsearch] 85 of 169 task(s) completed
INFO: [hmmsearch] 102 of 169 task(s) completed
INFO: [hmmsearch] 119 of 169 task(s) completed
INFO: [hmmsearch] 136 of 169 task(s) completed
INFO: [hmmsearch] 153 of 169 task(s) completed
INFO: [hmmsearch] 169 of 169 task(s) completed
INFO: [hmmsearch] 15 of 144 task(s) completed
INFO: [hmmsearch] 29 of 144 task(s) completed
INFO: [hmmsearch] 44 of 144 task(s) completed
INFO: [hmmsearch] 58 of 144 task(s) completed
INFO: [hmmsearch] 73 of 144 task(s) completed
INFO: [hmmsearch] 116 of 144 task(s) completed
INFO: [hmmsearch] 130 of 144 task(s) completed
INFO: [hmmsearch] 144 of 144 task(s) completed
INFO: Results: C:2.0%[S:2.0%,D:0.0%],F:0.1%,M:97.9%,n:2137
INFO:
......@@ -262,7 +262,7 @@ INFO:
|2092 Missing BUSCOs (M) |
|2137 Total BUSCO groups searched |
--------------------------------------------------
INFO: BUSCO analysis done with WARNING(s). Total running time: 227 seconds
INFO: BUSCO analysis done with WARNING(s). Total running time: 212 seconds
***** Summary of warnings: *****
WARNING:busco.ConfigManager Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment