Skip to content
Update Pipeline description authored by Marie Lahaye's avatar Marie Lahaye
...@@ -42,13 +42,13 @@ The gene predictor [GlimmerHMM](#glimmerhmm) is trained on the different genome ...@@ -42,13 +42,13 @@ The gene predictor [GlimmerHMM](#glimmerhmm) is trained on the different genome
![glimmerhmm](uploads/0ffd8be0eea2253682e5510ba364f447/glimmerhmm.png) ![glimmerhmm](uploads/0ffd8be0eea2253682e5510ba364f447/glimmerhmm.png)
#### Prediction with SNAP #### Prediction with SNAP using MAKER
To be trained, the gene predictor [SNAP](#snap) needs a GFF file generated by [MAKER](#maker). It is generated using one of the assemblies (12X.v0), the stranded transcriptome assembly, and protein sequences (Viridiplantae and eudicotyledone protein sequences). They are given to MAKER and an initial alignment is performed by [BLAST](#blast) and [Exonerate](#exonerate). Then, an _ab initio_ gene prediction is done with MAKER. At this step, a GFF file is generated. An HMM file is generated by using this GFF file to train SNAP. A second MAKER run is performed with enabled SNAP gene prediction and with the SNAP HMM file given in input. To be trained, the gene predictor [SNAP](#snap) needs a GFF file generated by [MAKER](#maker). It is generated using the assembly to annotate, the stranded transcriptome assembly, and protein sequences (Viridiplantae and eudicotyledone protein sequences). They are given to MAKER and an initial alignment is performed by [BLAST](#blast) and [Exonerate](#exonerate). Then, an _ab initio_ gene prediction is done with MAKER. At this step, a GFF file is generated. An HMM file is generated by using this GFF file to train SNAP. A second MAKER run is performed with enabled SNAP gene prediction and with the SNAP HMM file given in input.
The HMM file generation and the SNAP prediction with MAKER and the new HMM file is repeated. The HMM file generation and the SNAP prediction with MAKER and the new HMM file is repeated.
The HMM file generated with the first assembly is then used to run SNAP gene prediction on the other assemblies (PN40024.v4 REF and ALT). ![maker_snap](uploads/17eb83d00de6036407bf0400f7b91ca5/maker_snap.png)
#### Prediction with BRAKER2 #### Prediction with BRAKER2
...@@ -370,7 +370,7 @@ glimmerhmm -n 1 -g -o <out.gff> <file.fasta> path/to/training/dir/ ...@@ -370,7 +370,7 @@ glimmerhmm -n 1 -g -o <out.gff> <file.fasta> path/to/training/dir/
MAKER is an annotation pipeline, not a gene predictor. MAKER does not predict genes, rather MAKER leverages existing software tools (some of which are gene predictors) and integrates their output to produce what MAKER finds to be the best possible gene model for a given location based on evidence alignments. MAKER is an annotation pipeline, not a gene predictor. MAKER does not predict genes, rather MAKER leverages existing software tools (some of which are gene predictors) and integrates their output to produce what MAKER finds to be the best possible gene model for a given location based on evidence alignments.
**Arguments**: MAKER uses 3 different control files that can be generated with the command `maker -CTL`: _maker_opts.ctl_, _maker_exe.ctl_ and _maker_evm.ctl_. The main configuration file is the _maker_opts.ctl_, where we can set the location of the genome, transcript (EST) and protein input files. Other option will be defined like:
- _max\_dna\_len=300000_: length for dividing up contigs into chunks (increases/decreases memory usage) - _max\_dna\_len=300000_: length for dividing up contigs into chunks (increases/decreases memory usage)
- _split\_hit=20000_: length for the splitting of hits (expected max intron size for evidence alignments) - _split\_hit=20000_: length for the splitting of hits (expected max intron size for evidence alignments)
... ...
......