Error when parsing proteins containing the translation stop symbol "*"
Hello, and thank you for creating and maintaining ProteinOrtho.
I tested on dataset with proteins containing the translation stop symbol "*" and it failed on lines which start with this symbols.
I believe this issue is connected this one.
Also, our sequences are chopped every 60aa, which is probably why the error happens.
Following the log:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_TIME = "ja_JP.UTF-8",
LC_MONETARY = "ja_JP.UTF-8",
LC_ADDRESS = "ja_JP.UTF-8",
LC_TELEPHONE = "ja_JP.UTF-8",
LC_NAME = "ja_JP.UTF-8",
LC_MEASUREMENT = "ja_JP.UTF-8",
LC_IDENTIFICATION = "ja_JP.UTF-8",
LC_NUMERIC = "ja_JP.UTF-8",
LC_PAPER = "ja_JP.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_TIME = "ja_JP.UTF-8",
LC_MONETARY = "ja_JP.UTF-8",
LC_ADDRESS = "ja_JP.UTF-8",
LC_TELEPHONE = "ja_JP.UTF-8",
LC_NAME = "ja_JP.UTF-8",
LC_MEASUREMENT = "ja_JP.UTF-8",
LC_IDENTIFICATION = "ja_JP.UTF-8",
LC_NUMERIC = "ja_JP.UTF-8",
LC_PAPER = "ja_JP.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
*****************************************************************
[1;32mProteinortho[0m with PoFF version 6.3.2 - An orthology detection tool
*****************************************************************
Using 128 CPU threads (1 threads per processes each with 128 threads), Detected 'diamond' version 2.0.12
Checking input files.
Parameter-vector : (version=6.3.2,step=0,verbose=1,debug=1,synteny=0,duplication=2,cs=3,alpha=0.5,connectivity=0.1,cpus=128,evalue=1e-05,purity=-1,coverage=50,identity=25,blastmode=diamond,sim=0.95,report=3,keep=0,force=0,selfblast=0,twilight=0,core=0,coreMinSpecies=0,coreMaxProts=10,pseudo=1,omni=0,identical=0,range=-1,singles=0,clean=1,blastOptions=,makeBlastOptions=,nograph=0,xml=0,desc=0,tmp_path=./proteinortho_cache_po6-default-reference_2000_mags-128cpus/,blastversion=2.0.12,binpath=,makedb=diamond makedb -p 128 --in,blast=,jobs_todo=0,project=po6-default-reference_2000_mags-128cpus,inproject=po6-default-reference_2000_mags-128cpus,po_path=/home/salvocos/work_repos/proteinortho//src/BUILD/Linux_x86_64,run_id=,threads_per_process=1,um=0)
[1;31m[Error][0m [1;33m
ERROR found line with forbidden symbols in '/ssd_home/sp2-genome-biology-review-runs/input/reference_2000_mags/2156126005_1.faa' that is '*' (/^[^a-z#>]/i)
full line:
*
[0m
Please visit the proteinortho-wiki, where the most common errors are documented:
https://gitlab.com/paulklemm_PHD/proteinortho/wikis/Error%20Codes
If you cannot solve this error, please file a report (including the input files, the error code and the above 'Parameter-vector'):
incoming+paulklemm-phd-proteinortho-7278443-issue-@incoming.gitlab.com
Further more all mails to lechner@staff.uni-marburg.de are welcome.
Edited by Salvatore Cosentino