Skip to content
Update Pipeline description authored by Marie Lahaye's avatar Marie Lahaye
...@@ -118,7 +118,7 @@ fastp --thread 4 -i <in_1.fastq.gz> -I <in_2.fastq.gz> \ ...@@ -118,7 +118,7 @@ fastp --thread 4 -i <in_1.fastq.gz> -I <in_2.fastq.gz> \
**Arguments**: **Arguments**:
- _--thread_ \<numberOfThreads\>_: number of threads to use for the processing - _--thread <numberOfThreads>_: number of threads to use for the processing
With this simple usage command line, by default, fastp will quality filtering reads with a mean phred quality below 15 and allow 40% of bases to be unqualified. It will also filter based on a minimum length requirement, and filter reads which have a complexity below 30%. To finish, fastp will trim adapters based on which are detected. With this simple usage command line, by default, fastp will quality filtering reads with a mean phred quality below 15 and allow 40% of bases to be unqualified. It will also filter based on a minimum length requirement, and filter reads which have a complexity below 30%. To finish, fastp will trim adapters based on which are detected.
...@@ -142,8 +142,8 @@ gmap_build --nthreads 8 --genomedb <genomeName> --dir path/to/somewhere/ <in.fas ...@@ -142,8 +142,8 @@ gmap_build --nthreads 8 --genomedb <genomeName> --dir path/to/somewhere/ <in.fas
**Arguments**: **Arguments**:
- _--nthreads \<numberOfThreads\>_: number of threads to use for the processing (default value: 8) - _--nthreads <numberOfThreads>_: number of threads to use for the processing (default value: 8)
- _--genomedb \<genomeName\>_: name of the genome (index files prefix) - _--genomedb <genomeName>_: name of the genome (index files prefix)
- _--dir path/to/somewhere/_: directory where index files will be written - _--dir path/to/somewhere/_: directory where index files will be written
Then, paired-end RNA-Seq data are aligned against the database created as described above. Input files must be in FASTQ or FASTA format. The FASTQ input may include quality scores, which will then be included in SAM output: Then, paired-end RNA-Seq data are aligned against the database created as described above. Input files must be in FASTQ or FASTA format. The FASTQ input may include quality scores, which will then be included in SAM output:
...@@ -158,13 +158,13 @@ gsnap --gunzip --nthreads 8 --dir path/to/somewhere/ --db <dbName> --batch 5 --n ...@@ -158,13 +158,13 @@ gsnap --gunzip --nthreads 8 --dir path/to/somewhere/ --db <dbName> --batch 5 --n
**Arguments**: **Arguments**:
- _--gunzip_: allow to take gzip files in input - _--gunzip_: allow to take gzip files in input
- _--nthreads \<numberOfThreads\>_: number of threads to use for the processing - _--nthreads <numberOfThreads>_: number of threads to use for the processing
- _--dir path/to/somewhere/_: directory to the genome database - _--dir path/to/somewhere/_: directory to the genome database
- _--db \<dbName\>_: name of the database create above (prefix) - _--db <dbName>_: name of the database create above (prefix)
- _--batch \<batchMode\>_: batch mode (default value: 2) - see help for more informations - _--batch <batchMode>_: batch mode (default value: 2) - see help for more informations
- _--novelsplicing \<0/1\>_: 0=no (default) and 1=yes - _--novelsplicing <0/1>_: 0=no (default) and 1=yes
- _--format \<outputFormat\>_: format of the output file (sam or m8 are implemented) - _--format <outputFormat>_: format of the output file (sam or m8 are implemented)
- _--output-file \<out.file\>_: output file name and path - _--output-file <out.file>_: output file name and path
- _--nofails_: exclude failed alignments - _--nofails_: exclude failed alignments
### Samtools ### Samtools
...@@ -189,9 +189,9 @@ samtools view -F 0x100 -b -o <out.bam> <in.sam> ...@@ -189,9 +189,9 @@ samtools view -F 0x100 -b -o <out.bam> <in.sam>
**Arguments**: **Arguments**:
- _-F \<flag\>_: only include reads with none of the flags present - _-F <flag>_: only include reads with none of the flags present
- _-b_: output in the format BAM - _-b_: output in BAM format
- _-o \<out.bam\>_: output file name - _-o <out.bam>_: output file name
#### Samtools faidx #### Samtools faidx
...@@ -225,8 +225,8 @@ PsiCLASS is an accurate and efficient transcript assembler which simultaneously ...@@ -225,8 +225,8 @@ PsiCLASS is an accurate and efficient transcript assembler which simultaneously
**Arguments**: **Arguments**:
- _-b \<listOfBamFiles\>_: list of BAM files separated by comma - _-b <listOfBamFiles>_: list of BAM files separated by comma
- _-o \<output.gtf\>_: path to output file - _-o <output.gtf>_: path to output file
*Note*: If there is a "-" in one file path, PsiCLASS will return a segmentation fault ! So it's important to avoid them when naming directories or files. *Note*: If there is a "-" in one file path, PsiCLASS will return a segmentation fault ! So it's important to avoid them when naming directories or files.
...@@ -250,9 +250,9 @@ pblat -threads=8 -t=dnax -q=prot -noHead <database.fasta> <query.fasta> <output. ...@@ -250,9 +250,9 @@ pblat -threads=8 -t=dnax -q=prot -noHead <database.fasta> <query.fasta> <output.
**Arguments**: **Arguments**:
- _-threads=\<numberOfThreads\>_: number of threads to run - _-threads=<numberOfThreads>_: number of threads to run
- _-t=\<dna/prot/dnax\>_: database (genome) type - _-t=<dna/prot/dnax>_: database (genome) type
- _-q=\<dna/rna/prot/dnax/rnax\>_: query (sequences which will be aligned) type - _-q=<dna/rna/prot/dnax/rnax>_: query (sequences which will be aligned) type
- _-noHead_: no header in PSL output file - _-noHead_: no header in PSL output file
### Exonerate ### Exonerate
...@@ -306,13 +306,13 @@ exonerate --model p2g --query <query.fasta> --target localhost:12886 \ ...@@ -306,13 +306,13 @@ exonerate --model p2g --query <query.fasta> --target localhost:12886 \
**Argument**: **Argument**:
- _--model \<modelOfAlignment\>_: model to align the sequences (protein2genome, coding2coding, etc...) - _--model <modelOfAlignment>_: model to align the sequences (protein2genome, coding2coding, etc...)
- _--query \<query.fasta\>_: sequences you want to map - _--query <query.fasta>_: sequences you want to map
- _--target \<target.fasta\>_: hostname:port where the database is hosted - _--target <target.fasta>_: hostname:port where the database is hosted
- _--showtargetgff yes_: return GFF output on the target sequences - _--showtargetgff yes_: return GFF output on the target sequences
- _--verbose_: show information about what is going on during the analysis (0 to don't display these informations) - _--verbose_: show information about what is going on during the analysis (0 to don't display these informations)
- _--showalignment \<yes/no\>_: show the alignments in an human readable form - _--showalignment <yes/no>_: show the alignments in an human readable form
- _--showvulgar \<yes/no\>_: show the alignments in "vulgar" format - _--showvulgar <yes/no>_: show the alignments in "vulgar" format
- _--ryo_: allows to print other information about each alignment (the average percent identity and similary or the score for example) - _--ryo_: allows to print other information about each alignment (the average percent identity and similary or the score for example)
- _--querychunkid 1-to-n_ and _--querychunktotal n_: to split the fasta in n chunk (one command per chunk) and parallelize exonerate on n threads - _--querychunkid 1-to-n_ and _--querychunktotal n_: to split the fasta in n chunk (one command per chunk) and parallelize exonerate on n threads
...@@ -387,8 +387,8 @@ maker -c 24 -base prefix_name ...@@ -387,8 +387,8 @@ maker -c 24 -base prefix_name
**Arguments**: **Arguments**:
- _-c_: number of threads - _-c <numberOfThreads_: number of threads to use
- _-base_: output files prefix - _-base prefix_: output files prefix
MAKER generates multiple GFF files, to merge them, you can use the following command: MAKER generates multiple GFF files, to merge them, you can use the following command:
...@@ -450,7 +450,7 @@ braker.pl --genome=assembly.fasta --bam=file1.bam,file2.bam --prot_seq=prot1.fas ...@@ -450,7 +450,7 @@ braker.pl --genome=assembly.fasta --bam=file1.bam,file2.bam --prot_seq=prot1.fas
- _--bam=<file1.bam>,<file2.bam>_: List of BAM file (primary mapping alignments) separated by a comma - _--bam=<file1.bam>,<file2.bam>_: List of BAM file (primary mapping alignments) separated by a comma
- _--prot=<prot1.fasta>,<prot2.fasta>_: List of multifasta proteins files separated by a comma - _--prot=<prot1.fasta>,<prot2.fasta>_: List of multifasta proteins files separated by a comma
- _--prg=<ph/gth/exonerate/spaln>_: Alignment tool for generating hints from protein data (ProtHint: ph) - _--prg=<ph/gth/exonerate/spaln>_: Alignment tool for generating hints from protein data (ProtHint: ph)
- _--cores 8_: Maximum of cores that can be used during computation (max: 8) - _--cores <numberOfThreads>_: Maximum of cores that can be used during computation (max: 8)
- _--workingdir=path/to/workdir_: Path to working directory where temporary and output files will be written - _--workingdir=path/to/workdir_: Path to working directory where temporary and output files will be written
- _--eptmode_: Run GeneMark-ETP with hints provided from proteins and RNA-Seq data - _--eptmode_: Run GeneMark-ETP with hints provided from proteins and RNA-Seq data
- _--gff3_: Output in GFF3 format - _--gff3_: Output in GFF3 format
... ...
......