Changes

Adrià Auladell · de3a8fa7
--- a/dada2-guidelines.md
+++ b/dada2-guidelines.md
+**DADA2** is an R package for extracting the amplicon sequence variants form FASTQs. 
+
+To use it you should be just familiar with R and Bash. 
+
+The main difference with creating OTUs is that the process is clustering free, without established threshold, defining the amplicon sequence variants
+at the smallest level possible, with even differences of only one nucleotide. 
+
+The web [Tutorial](https://benjjneb.github.io/dada2/tutorial.html)  presents nearly everything needed for using it. Some things that for me didn't work and 
+took to long for realizing it:
+
+- Cut your primers. `cutadapt` does the job really easy! 
+
+- If around 25-30 % of the reads are lost in the process of ASV generation, possibly some of the parameters have to be changed. 
+
+	* Are you sure that the primers from the FASTQ are removed?
+
+ * What `maxee` did you specify? If this is making many reads to be lost, you can specify a bigger maxee, and even different values for the F and R reads (for example `c(2,4)`).
+The algorithm will take into account the errors in the modelling phase, so this will not make your ASVs erroneus. 
+
+ * Does the pair of reads overlapp? By how many bases? It should be >= 20 nt. 
+
+> If you follow the tutorial, at the end of the procedure a **track analysis** is generated specifing how many reads are lost along the whole procedure. It is the best way to know where it failed. 
+
+- In the trimming procedure, the `truncLen` cuts all the reads to an specific length and *removes* all reads being smaller.  It is important then to know the average read length, since if you go too low with the trimming you will lose too much reads. 
+
+ * In the pipeline of DADA2 there is a quality profile, you should be aware of it in deciding where to cut.
+ * 
+ * For each run the trimming point is different, so if you are working on multiple runs each of them have to be processed separatedly and then joined together with `mergeSequenceTables`. 
+
+ * You should have an analysis of the FASTQs. The av. length, the avg quality for each sample, and so on. Many of the problems with recovering most of the reads
+stem from having a low quality sample, or the reads not being properly amplified. `seqkit` is a good tool for this kind of information. 
+
+- The taxonomy assignation is realized at the Species level only if only a 100%, exact matching. This can make that some bacteria/eukarya present some differences
+in the identification at that level when comparing with OTU results. See a link explaining this in more detail [here](https://benjjneb.github.io/dada2/assign.html#species-assignment).
+
+----
+
+#### How to use it in our biocluster
+
+You just need to use the `qsub` system as always, with the command `Rscript`. This only executes the Script, as the name say. 
+
+First, the following modules should be called! 
+
+`module load module load Rstats/R-3.4.1`
+
+`module load  gcc/4.9.0`
+
+My approach is:
+
+- Cut the primers.
+- Do the quality profile, look at which length i will trim. 
+- Perform the dada2 procedure. 
+- Check the results, modify the parameters if the resulting reads are too low and run it again. 
+- Boom, you have some ASVs. 
+
+
+
+Hope it helps. Cheers!
\ No newline at end of file