... | ... | @@ -118,7 +118,7 @@ fastp --thread 4 -i <in_1.fastq.gz> -I <in_2.fastq.gz> \ |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _--thread_ \<numberOfThreads\>_: number of threads to use for the processing
|
|
|
- _--thread <numberOfThreads>_: number of threads to use for the processing
|
|
|
|
|
|
With this simple usage command line, by default, fastp will quality filtering reads with a mean phred quality below 15 and allow 40% of bases to be unqualified. It will also filter based on a minimum length requirement, and filter reads which have a complexity below 30%. To finish, fastp will trim adapters based on which are detected.
|
|
|
|
... | ... | @@ -142,8 +142,8 @@ gmap_build --nthreads 8 --genomedb <genomeName> --dir path/to/somewhere/ <in.fas |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _--nthreads \<numberOfThreads\>_: number of threads to use for the processing (default value: 8)
|
|
|
- _--genomedb \<genomeName\>_: name of the genome (index files prefix)
|
|
|
- _--nthreads <numberOfThreads>_: number of threads to use for the processing (default value: 8)
|
|
|
- _--genomedb <genomeName>_: name of the genome (index files prefix)
|
|
|
- _--dir path/to/somewhere/_: directory where index files will be written
|
|
|
|
|
|
Then, paired-end RNA-Seq data are aligned against the database created as described above. Input files must be in FASTQ or FASTA format. The FASTQ input may include quality scores, which will then be included in SAM output:
|
... | ... | @@ -158,13 +158,13 @@ gsnap --gunzip --nthreads 8 --dir path/to/somewhere/ --db <dbName> --batch 5 --n |
|
|
**Arguments**:
|
|
|
|
|
|
- _--gunzip_: allow to take gzip files in input
|
|
|
- _--nthreads \<numberOfThreads\>_: number of threads to use for the processing
|
|
|
- _--nthreads <numberOfThreads>_: number of threads to use for the processing
|
|
|
- _--dir path/to/somewhere/_: directory to the genome database
|
|
|
- _--db \<dbName\>_: name of the database create above (prefix)
|
|
|
- _--batch \<batchMode\>_: batch mode (default value: 2) - see help for more informations
|
|
|
- _--novelsplicing \<0/1\>_: 0=no (default) and 1=yes
|
|
|
- _--format \<outputFormat\>_: format of the output file (sam or m8 are implemented)
|
|
|
- _--output-file \<out.file\>_: output file name and path
|
|
|
- _--db <dbName>_: name of the database create above (prefix)
|
|
|
- _--batch <batchMode>_: batch mode (default value: 2) - see help for more informations
|
|
|
- _--novelsplicing <0/1>_: 0=no (default) and 1=yes
|
|
|
- _--format <outputFormat>_: format of the output file (sam or m8 are implemented)
|
|
|
- _--output-file <out.file>_: output file name and path
|
|
|
- _--nofails_: exclude failed alignments
|
|
|
|
|
|
### Samtools
|
... | ... | @@ -189,9 +189,9 @@ samtools view -F 0x100 -b -o <out.bam> <in.sam> |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _-F \<flag\>_: only include reads with none of the flags present
|
|
|
- _-b_: output in the format BAM
|
|
|
- _-o \<out.bam\>_: output file name
|
|
|
- _-F <flag>_: only include reads with none of the flags present
|
|
|
- _-b_: output in BAM format
|
|
|
- _-o <out.bam>_: output file name
|
|
|
|
|
|
#### Samtools faidx
|
|
|
|
... | ... | @@ -225,8 +225,8 @@ PsiCLASS is an accurate and efficient transcript assembler which simultaneously |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _-b \<listOfBamFiles\>_: list of BAM files separated by comma
|
|
|
- _-o \<output.gtf\>_: path to output file
|
|
|
- _-b <listOfBamFiles>_: list of BAM files separated by comma
|
|
|
- _-o <output.gtf>_: path to output file
|
|
|
|
|
|
*Note*: If there is a "-" in one file path, PsiCLASS will return a segmentation fault ! So it's important to avoid them when naming directories or files.
|
|
|
|
... | ... | @@ -250,9 +250,9 @@ pblat -threads=8 -t=dnax -q=prot -noHead <database.fasta> <query.fasta> <output. |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _-threads=\<numberOfThreads\>_: number of threads to run
|
|
|
- _-t=\<dna/prot/dnax\>_: database (genome) type
|
|
|
- _-q=\<dna/rna/prot/dnax/rnax\>_: query (sequences which will be aligned) type
|
|
|
- _-threads=<numberOfThreads>_: number of threads to run
|
|
|
- _-t=<dna/prot/dnax>_: database (genome) type
|
|
|
- _-q=<dna/rna/prot/dnax/rnax>_: query (sequences which will be aligned) type
|
|
|
- _-noHead_: no header in PSL output file
|
|
|
|
|
|
### Exonerate
|
... | ... | @@ -306,13 +306,13 @@ exonerate --model p2g --query <query.fasta> --target localhost:12886 \ |
|
|
|
|
|
**Argument**:
|
|
|
|
|
|
- _--model \<modelOfAlignment\>_: model to align the sequences (protein2genome, coding2coding, etc...)
|
|
|
- _--query \<query.fasta\>_: sequences you want to map
|
|
|
- _--target \<target.fasta\>_: hostname:port where the database is hosted
|
|
|
- _--model <modelOfAlignment>_: model to align the sequences (protein2genome, coding2coding, etc...)
|
|
|
- _--query <query.fasta>_: sequences you want to map
|
|
|
- _--target <target.fasta>_: hostname:port where the database is hosted
|
|
|
- _--showtargetgff yes_: return GFF output on the target sequences
|
|
|
- _--verbose_: show information about what is going on during the analysis (0 to don't display these informations)
|
|
|
- _--showalignment \<yes/no\>_: show the alignments in an human readable form
|
|
|
- _--showvulgar \<yes/no\>_: show the alignments in "vulgar" format
|
|
|
- _--showalignment <yes/no>_: show the alignments in an human readable form
|
|
|
- _--showvulgar <yes/no>_: show the alignments in "vulgar" format
|
|
|
- _--ryo_: allows to print other information about each alignment (the average percent identity and similary or the score for example)
|
|
|
- _--querychunkid 1-to-n_ and _--querychunktotal n_: to split the fasta in n chunk (one command per chunk) and parallelize exonerate on n threads
|
|
|
|
... | ... | @@ -387,8 +387,8 @@ maker -c 24 -base prefix_name |
|
|
|
|
|
**Arguments**:
|
|
|
|
|
|
- _-c_: number of threads
|
|
|
- _-base_: output files prefix
|
|
|
- _-c <numberOfThreads_: number of threads to use
|
|
|
- _-base prefix_: output files prefix
|
|
|
|
|
|
MAKER generates multiple GFF files, to merge them, you can use the following command:
|
|
|
|
... | ... | @@ -450,7 +450,7 @@ braker.pl --genome=assembly.fasta --bam=file1.bam,file2.bam --prot_seq=prot1.fas |
|
|
- _--bam=<file1.bam>,<file2.bam>_: List of BAM file (primary mapping alignments) separated by a comma
|
|
|
- _--prot=<prot1.fasta>,<prot2.fasta>_: List of multifasta proteins files separated by a comma
|
|
|
- _--prg=<ph/gth/exonerate/spaln>_: Alignment tool for generating hints from protein data (ProtHint: ph)
|
|
|
- _--cores 8_: Maximum of cores that can be used during computation (max: 8)
|
|
|
- _--cores <numberOfThreads>_: Maximum of cores that can be used during computation (max: 8)
|
|
|
- _--workingdir=path/to/workdir_: Path to working directory where temporary and output files will be written
|
|
|
- _--eptmode_: Run GeneMark-ETP with hints provided from proteins and RNA-Seq data
|
|
|
- _--gff3_: Output in GFF3 format
|
... | ... | |
... | ... | |