--Offline
I'm running on a cluster that is isolated from the interenet, thus I need to run BUSCO using --offline. I have made a busco_download directory with directories for lineages, placement_files and information:
-busco_downloads
--lineages
--placement_files
--information
inside the 'lineages' directory I have uncompressed all of the odb10 directories (~165 directories), for example:
viridiplantae_odb10
-rw-r--r-- 1 pjm43 pjm43 8.6K Nov 20 2019 scores_cutoff
drwxr-sr-x 2 pjm43 pjm43 32K Nov 20 2019 prfl
-rw-r--r-- 1 pjm43 pjm43 13K Nov 20 2019 lengths_cutoff
drwxr-sr-x 2 pjm43 pjm43 4.0K Nov 20 2019 info
drwxr-sr-x 2 pjm43 pjm43 32K Nov 20 2019 hmms
-rw-r--r-- 1 pjm43 pjm43 2.2M Nov 20 2019 ancestral_variants
-rw-r--r-- 1 pjm43 pjm43 221K Nov 20 2019 ancestral
-rw-r--r-- 1 pjm43 pjm43 40K Nov 27 2019 links_to_ODB10.txt
-rw-r--r-- 1 pjm43 pjm43 1.1M Aug 5 03:46 refseq_db.faa.gz
-rw-r--r-- 1 pjm43 pjm43 165 Aug 5 03:46 dataset.cfg
In the information
I've uncompressed: lineages_list.2019-11-27.txt
In the placement
directory I've uncompressed the following:
-rw-r--r-- 1 pjm43 pjm43 1.3K Dec 16 2019 tree_metadata.eukaryota_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 1.6K Dec 16 2019 tree_metadata.bacteria_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 6.0K Dec 16 2019 tree_metadata.archaea_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 40K Dec 16 2019 tree.eukaryota_odb10.2019-12-16.nwk
-rw-r--r-- 1 pjm43 pjm43 163K Dec 16 2019 tree.bacteria_odb10.2019-12-16.nwk
-rw-r--r-- 1 pjm43 pjm43 14K Dec 16 2019 tree.archaea_odb10.2019-12-16.nwk
-rw-r--r-- 1 pjm43 pjm43 28M Dec 16 2019 supermatrix.aln.eukaryota_odb10.2019-12-16.faa
-rw-r--r-- 1 pjm43 pjm43 36M Dec 16 2019 supermatrix.aln.bacteria_odb10.2019-12-16.faa
-rw-r--r-- 1 pjm43 pjm43 2.2M Dec 16 2019 supermatrix.aln.archaea_odb10.2019-12-16.faa
-rw-r--r-- 1 pjm43 pjm43 1.2K Dec 16 2019 mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 1.8K Dec 16 2019 mapping_taxids-busco_dataset_name.bacteria_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 336 Dec 16 2019 mapping_taxids-busco_dataset_name.archaea_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 686K Dec 16 2019 mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 1.5M Dec 16 2019 mapping_taxid-lineage.bacteria_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 128K Dec 16 2019 mapping_taxid-lineage.archaea_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 1.6K Dec 16 2019 list_of_reference_markers.eukaryota_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 530 Dec 16 2019 list_of_reference_markers.bacteria_odb10.2019-12-16.txt
-rw-r--r-- 1 pjm43 pjm43 365 Dec 16 2019 list_of_reference_markers.archaea_odb10.2019-12-16.txt
In the config.ini I've set the download_path to the busco_downloads
directory.
I then ran BUSCO4.1.2 on an fairly complete plant genome using --auto-lineage --offline -m genome
The job finished, but indicated that "Not enough markers were placed on the tree (3). Root lineage eukaryota is kept".
I don't understand why BUSCO did not progress to finding a more specific lineage (e.g., viridiplantae_odb10)??
Any help would be greatly appreciated (thanks in advance),
Jeff
Below is the full slurm:
INFO: ***** Start a BUSCO v4.1.2 analysis, current time: 09/09/2020 06:20:20 *****
INFO: Configuring BUSCO with /fslgroup/fslg_pws_module/compute/.conda-pws/envs/busco-4.1.2/share/busco/config.ini
INFO: Mode is genome
INFO: Input file is GMI0423_purged.fa
INFO: No lineage specified. Running lineage auto selector.
INFO: ***** Starting Auto Select Lineage *****
This process runs BUSCO on the generic lineage datasets for the domains archaea, bacteria and eukaryota. Once the optimal domain is selected, BUSCO automatically attempts to find the most appropriate BUSCO dataset to use based on phylogenetic placement.
--auto-lineage-euk and --auto-lineage-prok are also available if you know your input assembly is, or is not, an eukaryote. See the user guide for more information.
A reminder: Busco evaluations are valid when an appropriate dataset is used, i.e., the dataset belongs to the lineage of the species to test. Because of overlapping markers/spurious matches among domains, busco matches in another domain do not necessarily mean that your genome/proteome contains sequences from this domain. However, a high busco score in multiple domains might help you identify possible contaminations.
INFO: Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2020-03-06)
INFO: ***** Run Prodigal on input to predict and extract genes *****
INFO: Running Prodigal with genetic code 11 in single mode
INFO: Running 1 job(s) on prodigal, starting at 09/09/2020 06:21:02
INFO: [prodigal] 1 of 1 task(s) completed
INFO: Running Prodigal with genetic code 11 in meta mode
INFO: Running 1 job(s) on prodigal, starting at 09/09/2020 06:23:52
INFO: [prodigal] 1 of 1 task(s) completed
WARNING: Prodigal did not recognize any genes matching the dataset archaea_odb10 in the input file. If this is unexpected, check your input file and your installation of Prodigal
INFO: Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2020-03-06)
INFO: ***** Run Prodigal on input to predict and extract genes *****
INFO: Running Prodigal with genetic code 4 in single mode
INFO: Running 1 job(s) on prodigal, starting at 09/09/2020 06:25:05
INFO: [prodigal] 1 of 1 task(s) completed
INFO: Running Prodigal with genetic code 4 in meta mode
INFO: Running 1 job(s) on prodigal, starting at 09/09/2020 06:27:18
INFO: [prodigal] 1 of 1 task(s) completed
WARNING: Prodigal did not recognize any genes matching the dataset bacteria_odb10 in the input file. If this is unexpected, check your input file and your installation of Prodigal
INFO: Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2020-08-05)
INFO: Running 1 job(s) on makeblastdb, starting at 09/09/2020 06:28:17
INFO: Creating BLAST database with input file
INFO: [makeblastdb] 1 of 1 task(s) completed
INFO: Running a BLAST search for BUSCOs against created database
INFO: Running 1 job(s) on tblastn, starting at 09/09/2020 06:29:23
INFO: [tblastn] 1 of 1 task(s) completed
INFO: Running Augustus gene predictor on BLAST search results.
INFO: Running Augustus prediction using fly as species:
INFO: Running 735 job(s) on augustus, starting at 09/09/2020 08:32:51
INFO: [augustus] 74 of 735 task(s) completed
INFO: [augustus] 147 of 735 task(s) completed
INFO: [augustus] 221 of 735 task(s) completed
INFO: [augustus] 294 of 735 task(s) completed
INFO: [augustus] 368 of 735 task(s) completed
INFO: [augustus] 441 of 735 task(s) completed
INFO: [augustus] 515 of 735 task(s) completed
INFO: [augustus] 588 of 735 task(s) completed
INFO: [augustus] 662 of 735 task(s) completed
INFO: [augustus] 735 of 735 task(s) completed
INFO: Extracting predicted proteins...
INFO: ***** Run HMMER on gene sequences *****
INFO: Running 720 job(s) on hmmsearch, starting at 09/09/2020 09:30:17
INFO: [hmmsearch] 72 of 720 task(s) completed
INFO: [hmmsearch] 144 of 720 task(s) completed
INFO: [hmmsearch] 216 of 720 task(s) completed
INFO: [hmmsearch] 288 of 720 task(s) completed
INFO: [hmmsearch] 360 of 720 task(s) completed
INFO: [hmmsearch] 432 of 720 task(s) completed
INFO: [hmmsearch] 504 of 720 task(s) completed
INFO: [hmmsearch] 576 of 720 task(s) completed
INFO: [hmmsearch] 648 of 720 task(s) completed
INFO: [hmmsearch] 720 of 720 task(s) completed
INFO: Results: C:97.3%[S:6.3%,D:91.0%],F:0.4%,M:2.3%,n:255
INFO: Starting second step of analysis. The gene predictor Augustus is retrained using the results from the initial run to yield more accurate results.
INFO: Extracting missing and fragmented buscos from the file ancestral_variants...
INFO: Running a BLAST search for BUSCOs against created database
INFO: Running 1 job(s) on tblastn, starting at 09/09/2020 09:30:30
INFO: [tblastn] 1 of 1 task(s) completed
INFO: Converting predicted genes to short genbank files
INFO: Running 16 job(s) on gff2gbSmallDNA.pl, starting at 09/09/2020 09:44:45
INFO: [gff2gbSmallDNA.pl] 2 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 4 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 5 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 7 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 9 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 10 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 12 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 13 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 15 of 16 task(s) completed
INFO: [gff2gbSmallDNA.pl] 16 of 16 task(s) completed
INFO: All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs
INFO: Running 1 job(s) on new_species.pl, starting at 09/09/2020 09:47:20
INFO: [new_species.pl] 1 of 1 task(s) completed
INFO: Running 1 job(s) on etraining, starting at 09/09/2020 09:47:21
INFO: [etraining] 1 of 1 task(s) completed
INFO: Re-running Augustus with the new metaparameters, number of target BUSCOs: 7
INFO: Running Augustus gene predictor on BLAST search results.
INFO: Running Augustus prediction using BUSCO_GMI0423_purged.fa_busco4.1.2_newblast-autolineage as species:
INFO: Running 17 job(s) on augustus, starting at 09/09/2020 09:47:22
INFO: [augustus] 2 of 17 task(s) completed
INFO: [augustus] 4 of 17 task(s) completed
INFO: [augustus] 6 of 17 task(s) completed
INFO: [augustus] 7 of 17 task(s) completed
INFO: [augustus] 9 of 17 task(s) completed
INFO: [augustus] 11 of 17 task(s) completed
INFO: [augustus] 12 of 17 task(s) completed
INFO: [augustus] 14 of 17 task(s) completed
INFO: [augustus] 16 of 17 task(s) completed
INFO: [augustus] 17 of 17 task(s) completed
INFO: Extracting predicted proteins...
INFO: ***** Run HMMER on gene sequences *****
INFO: Running 17 job(s) on hmmsearch, starting at 09/09/2020 09:48:48
INFO: [hmmsearch] 2 of 17 task(s) completed
INFO: [hmmsearch] 4 of 17 task(s) completed
INFO: [hmmsearch] 6 of 17 task(s) completed
INFO: [hmmsearch] 7 of 17 task(s) completed
INFO: [hmmsearch] 9 of 17 task(s) completed
INFO: [hmmsearch] 11 of 17 task(s) completed
INFO: [hmmsearch] 12 of 17 task(s) completed
INFO: [hmmsearch] 14 of 17 task(s) completed
INFO: [hmmsearch] 16 of 17 task(s) completed
INFO: [hmmsearch] 17 of 17 task(s) completed
INFO: Results: C:99.2%[S:6.3%,D:92.9%],F:0.4%,M:0.4%,n:255
INFO: eukaryota_odb10 selected
INFO: ***** Searching tree for chosen lineage to find best taxonomic match *****
INFO: Extract markers...
INFO: Place the markers on the reference tree...
INFO: Running 1 job(s) on sepp, starting at 09/09/2020 09:49:08
INFO: [sepp] 1 of 1 task(s) completed
INFO: Not enough markers were placed on the tree (3). Root lineage eukaryota is kept
INFO:
--------------------------------------------------
|Results from dataset eukaryota_odb10 |
--------------------------------------------------
|C:99.2%[S:6.3%,D:92.9%],F:0.4%,M:0.4%,n:255 |
|253 Complete BUSCOs (C) |
|16 Complete and single-copy BUSCOs (S) |
|237 Complete and duplicated BUSCOs (D) |
|1 Fragmented BUSCOs (F) |
|1 Missing BUSCOs (M) |
|255 Total BUSCO groups searched |
--------------------------------------------------
INFO: BUSCO analysis done with WARNING(s). Total running time: 12633 seconds
***** Summary of warnings: *****
WARNING:busco.BuscoRunner Prodigal did not recognize any genes matching the dataset archaea_odb10 in the input file. If this is unexpected, check your input file and your installation of Prodigal
WARNING:busco.BuscoRunner Prodigal did not recognize any genes matching the dataset bacteria_odb10 in the input file. If this is unexpected, check your input file and your installation of Prodigal
INFO: Results written in /lustre/lifesci/fslg_lifesciences/pjm43/GMI423/busco/GMI0423_purged.fa_busco4.1.2_newblast-autolineage