EASEL:BUILD_TRAINING:TRAINING_SET terminated with an error exit status (1)
Hi there!
Thanks for all the work you put on EASEL. I have been testing with a oomycete genome and I encountered an issue as I am getting almost to the end:
[f9/19dc41] process > EASEL:BUILD_TRAINING:TRAINING_SET [100%] 1 of 1, failed: 1 ✘
[3c/37d05f] process > EASEL:SUMMARY_STATS:MIKADO (easel_unfiltered) [100%] 1 of 1 ✔
[04/493094] process > EASEL:SUMMARY_STATS:BUSCO (easel_unfiltered) [100%] 1 of 1 ✔
[cc/568783] process > EASEL:SUMMARY_STATS:AGAT (easel_unfiltered) [100%] 1 of 1 ✔
Plus 8 more processes waiting for tasks…
ERROR ~ Error executing process > 'EASEL:BUILD_TRAINING:TRAINING_SET'
Caused by:
Process EASEL:BUILD_TRAINING:TRAINING_SET terminated with an error exit status (1)
Command executed:
awk '{$1=$1; print}' OFS=' ' features.tracking > matrix
python /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py matrix start_site.txt Target feature.tracking
awk 'BEGIN { FS = OFS = " " } NR > 1 {
if ($26 == "") {
$26 = 0
} else {
$26 = int($26)
}
} 1' feature.tracking > target.tracking
python /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py matrix F1.txt F1 feature.tracking
awk 'BEGIN { FS=OFS=" " } NR > 1 { $26 = sprintf("%d", $26) }; 1' OFS=" " feature.tracking > F1.tracking
awk '{print $26}' target.tracking > target.txt
paste F1.tracking target.txt > easel.tracking
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Converting SIF file to temporary sandbox...
sys:1: DtypeWarning: Columns (21) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "/nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py", line 28, in <module>
df2 = pd.read_csv(feature, delimiter='\t')
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1893, in __init__
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 521, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
INFO: Cleaning up image...
Work dir:
/nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/work/f9/19dc410a22011b65b346bf20d6855b
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
-- Check '.nextflow.log' file for details
I check the features file and it looks like this:
[debary][/nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel]
> head -n 5 /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/work/99/75e57c740633eaeac87e7719635fea/features.tracking | cat -A
Gene Transcript S1-AUG_OrthoDB_PsiCLASS S2-AUG_OrthoDB_StringTie2 S3-AUG_TD_PsiCLASS S4-AUG_TD_StringTie2 S5-Transcriptomes Exons EggNOG_Bitscore EggNOG_SimilarityScore EggNOG_Evalue OrthoDB_Bitscore OrthoDB_SimilarityScore OrthoDB_Evalue Expression Molecular_Weight Transcript_Length CDS_Repeat_Content CDS_Length GC_Content Free_Energy GC_Ratio Kozak_Neg_3_Purine Kozak_Pos_4 ATG_Counts$
easel_gene-1 easel_mrna-7 0 0 0 1 0 2 531.0 51.5 3.41e-177 485.0 51.9 7.57e-163 6.18147 618103.16 1997 100.0 1848 58.74 -210.0 1.923728813559322 1.0 G 11$
easel_gene-1 easel_mrna-14 0 0 1 0 0 1 379.0 68.8 1.08e-116 0 0 1000 1.1268 654844.65 2112 100.0 2112 55.68 -224.1 1.92436974789916 1.0 A 10$
easel_gene-1 easel_mrna-19 0 0 0 1 0 1 379.0 68.8 2.9400000000000002e-117 0 0 1000 0.0 612947.00 1977 100.0 1977 55.18 -221.3 2.425742574257426 1.0 T 8$
easel_gene-1 easel_mrna-24 0 0 1 0 0 1 379.0 68.8 8.58e-118 0 0 1000 0.0 575018.81 1854 100.0 1854 55.45 -204.2 2.4123711340206184 0.0 G 7$
If you have any suggesting as how to get passed this issue please let me know. My command is really crazy since I am working with a non-model:
nextflow run main.nf -profile singularity --genome /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/results/repeats/p_capsici_014_EarlGrey/p_capsici_014_summaryFiles/p_capsici_014.softmasked.fasta --sra /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/sra_rna_data/pcap_sra_list.txt --busco_lineage stramenopiles --user_protein /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/cl_pero_and_orthodb/cl_pero_proteins_and_oomycota_orthodb.fasta --taxon oomycota --build_training_set true --singularity_cache_dir /nfs5/BPP/Grunwald_Lab/home/paradarc/singularity/nxf/singularity --max_cpus 64 --max_memory 200.GB --reference /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/GCA_016618375.1/genomic.gff --reference_db /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/databases/refseq_download/refseq_complete.dmnd -resume 6090750a-ff6f-4204-aa59-c130e0b85f85
Thanks,
Camilo