[STDERR] WARNING The input (new_all_13.tsv) contains 32942 queries, but I extracted 32935 entries out of the fasta(s).
Error message : [STDERR] Done reading the query new_all_13.tsv file. Now I know 2534 groups with 32942 genes/proteins in total.
!!! WARNING : This call will produce 2534 files (one for each orthology group) ! In the *.html file you can individually extract single groups by clicking on the front part of a row. Press 'strg+c' to prevent me from proceeding or wait 20 seconds to continue... !!!
Well then, proceeding...
[STDERR] (1/13) : Start reading the fasta file Aspalb.names.fas [STDERR] (2/13) : Done reading Aspalb.names.fas. Start reading the fasta file Aspfla.names.fas [STDERR] (3/13) : Done reading Aspfla.names.fas. Start reading the fasta file Bysspe.names.fas [STDERR] (4/13) : Done reading Bysspe.names.fas. Start reading the fasta file Elagra.names.fas [STDERR] (5/13) : Done reading Elagra.names.fas. Start reading the fasta file Monpur.names.fas [STDERR] (6/13) : Done reading Monpur.names.fas. Start reading the fasta file Paeniv.names.fas [STDERR] (7/13) : Done reading Paeniv.names.fas. Start reading the fasta file Penchr.names.fas [STDERR] (8/13) : Done reading Penchr.names.fas. Start reading the fasta file Penoxa.names.fas [STDERR] (9/13) : Done reading Penoxa.names.fas. Start reading the fasta file Talbor.names.fas [STDERR] (10/13) : Done reading Talbor.names.fas. Start reading the fasta file Talsti.names.fas [STDERR] (11/13) : Done reading Talsti.names.fas. Start reading the fasta file Thecru.names.fas [STDERR] (12/13) : Done reading Thecru.names.fas. Start reading the fasta file Thelan.names.fas [STDERR] (13/13) : Done reading Thelan.names.fas. Start reading the fasta file Tripara.names.fas [STDERR] WARNING The input (new_all_13.tsv) contains 32942 queries, but I extracted 32935 entries out of the fasta(s). -> This should not have happen, maybe some fasta files are missing as input? (If you cannot solve this error, please send a report to incoming+paulklemm-phd-proteinortho-7278443-issue-@incoming.gitlab.com or visit https://gitlab.com/paulklemm_PHD/proteinortho/wikis/Error%20Codes for more help. Further more all mails to lechner@staff.uni-marburg.de are welcome)
Parameter-vector :
Hello! Thanks for all your hard work making this program, updating it, and addressing issues. I have run into the above issue using a combination of jgi, funannotate, and genbank annotated files. I can't seem to figure out which special characters are choking the program. My guess it is the older genbank files (they start with XP in the accession numbers). Which characters do you think are cause this issue? i have combed the files and cant figure it out. the following are the commands i use to prep fasta files for input:
- Rename all proteins within files to make your life easier when you need to concatenate your files.
#For JGI files: cat Amyenc.fas | sed 's/jgi|//g' | sed 's/|//g' | sed 's/Amyenc1/Amyenc|/g' > Amyenc.names.fas
#For Augustus proteins: cat Micruf.fas | sed 's/>g/>Micruf|g/g' > Micruf.names.fas
#For Funannotate/Maker proteins: cat Nagfri_prots.fas | sed 's/>NAGFRI_/>NAGFRI|g/g' > Nagfri.names.fas
#For NCBI-Genbank (THE MOST DIFFICULT), where POR is the 3 letter accession. You may want to remove all commas, forward slashes, parentheses, and brackets too - add sed 's/,//g' | sed 's////g' | sed 's/(//g' | sed 's/)//g' cat Tolpar.fas | sed 's/ [Tolypocladium paradoxum]//g' | sed 's/>POR/>Tolpar|POR/g' | sed 's/ /_/g' | sed 's/,//g' | sed 's////g' | sed 's/(//g' | sed 's/)//g' | sed 's/[][]//g' > Tolpar.names.fas
Please advise! Thank you!