EASEL:BUILD_TRAINING:TRAINING_SET terminated with an error exit status (1)

Hi there!

Thanks for all the work you put on EASEL. I have been testing with a oomycete genome and I encountered an issue as I am getting almost to the end:

[f9/19dc41] process > EASEL:BUILD_TRAINING:TRAINING_SET                                     [100%] 1 of 1, failed: 1 ✘
[3c/37d05f] process > EASEL:SUMMARY_STATS:MIKADO (easel_unfiltered)                         [100%] 1 of 1 ✔
[04/493094] process > EASEL:SUMMARY_STATS:BUSCO (easel_unfiltered)                          [100%] 1 of 1 ✔
[cc/568783] process > EASEL:SUMMARY_STATS:AGAT (easel_unfiltered)                           [100%] 1 of 1 ✔
Plus 8 more processes waiting for tasks…
ERROR ~ Error executing process > 'EASEL:BUILD_TRAINING:TRAINING_SET'

Caused by:
  Process EASEL:BUILD_TRAINING:TRAINING_SET terminated with an error exit status (1)


Command executed:

  awk '{$1=$1; print}' OFS='    ' features.tracking > matrix
  python /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py matrix start_site.txt Target feature.tracking
  awk 'BEGIN { FS = OFS = "     " } NR > 1 {
      if ($26 == "") {
          $26 = 0
      } else {
          $26 = int($26)
      }
  } 1' feature.tracking > target.tracking
 
  python /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py matrix F1.txt F1 feature.tracking
  awk 'BEGIN { FS=OFS=" " } NR > 1 { $26 = sprintf("%d", $26) }; 1' OFS="       " feature.tracking > F1.tracking
 
  awk '{print $26}' target.tracking > target.txt
  paste F1.tracking target.txt > easel.tracking

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  sys:1: DtypeWarning: Columns (21) have mixed types.Specify dtype option on import or set low_memory=False.
  Traceback (most recent call last):
    File "/nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/bin/map_transcript.py", line 28, in <module>
      df2 = pd.read_csv(feature, delimiter='\t')
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 605, in read_csv
      return _read(filepath_or_buffer, kwds)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 457, in _read
      parser = TextFileReader(filepath_or_buffer, **kwds)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 814, in __init__
      self._engine = self._make_engine(self.engine)
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1045, in _make_engine
      return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
    File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1893, in __init__
      self._reader = parsers.TextReader(self.handles.handle, **kwds)
    File "pandas/_libs/parsers.pyx", line 521, in pandas._libs.parsers.TextReader.__cinit__
  pandas.errors.EmptyDataError: No columns to parse from file
  INFO:    Cleaning up image...

Work dir:
  /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/work/f9/19dc410a22011b65b346bf20d6855b

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

 -- Check '.nextflow.log' file for details

I check the features file and it looks like this:

[debary][/nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel]
> head -n 5 /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/scripts/annotation/easel/work/99/75e57c740633eaeac87e7719635fea/features.tracking | cat -A
Gene               Transcript         S1-AUG_OrthoDB_PsiCLASS  S2-AUG_OrthoDB_StringTie2  S3-AUG_TD_PsiCLASS  S4-AUG_TD_StringTie2  S5-Transcriptomes  Exons  EggNOG_Bitscore  EggNOG_SimilarityScore  EggNOG_Evalue            OrthoDB_Bitscore  OrthoDB_SimilarityScore  OrthoDB_Evalue           Expression  Molecular_Weight  Transcript_Length  CDS_Repeat_Content    CDS_Length  GC_Content  Free_Energy  GC_Ratio            Kozak_Neg_3_Purine  Kozak_Pos_4  ATG_Counts$
easel_gene-1       easel_mrna-7       0                        0                          0                   1                     0                  2      531.0            51.5                    3.41e-177                485.0             51.9                     7.57e-163                6.18147     618103.16         1997               100.0                 1848        58.74       -210.0       1.923728813559322   1.0                 G            11$
easel_gene-1       easel_mrna-14      0                        0                          1                   0                     0                  1      379.0            68.8                    1.08e-116                0                 0                        1000                     1.1268      654844.65         2112               100.0                 2112        55.68       -224.1       1.92436974789916    1.0                 A            10$
easel_gene-1       easel_mrna-19      0                        0                          0                   1                     0                  1      379.0            68.8                    2.9400000000000002e-117  0                 0                        1000                     0.0         612947.00         1977               100.0                 1977        55.18       -221.3       2.425742574257426   1.0                 T            8$
easel_gene-1       easel_mrna-24      0                        0                          1                   0                     0                  1      379.0            68.8                    8.58e-118                0                 0                        1000                     0.0         575018.81         1854               100.0                 1854        55.45       -204.2       2.4123711340206184  0.0                 G            7$

If you have any suggesting as how to get passed this issue please let me know. My command is really crazy since I am working with a non-model:

nextflow run main.nf -profile singularity --genome /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/results/repeats/p_capsici_014_EarlGrey/p_capsici_014_summaryFiles/p_capsici_014.softmasked.fasta --sra /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/sra_rna_data/pcap_sra_list.txt --busco_lineage stramenopiles --user_protein /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/cl_pero_and_orthodb/cl_pero_proteins_and_oomycota_orthodb.fasta --taxon oomycota --build_training_set true --singularity_cache_dir /nfs5/BPP/Grunwald_Lab/home/paradarc/singularity/nxf/singularity --max_cpus 64 --max_memory 200.GB --reference /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/evidence/GCA_016618375.1/genomic.gff --reference_db /nfs5/BPP/Grunwald_Lab/home/paradarc/pcap_ny_genomes/data/databases/refseq_download/refseq_complete.dmnd -resume 6090750a-ff6f-4204-aa59-c130e0b85f85

Thanks,

Camilo