Changes

Alex Trouern-Trend · fbe833ab
--- a/Scripts-Descriptions.md
+++ b/Scripts-Descriptions.md
@@ -10,6 +10,7 @@ Table of Contents
 * [Softmasking Genome](#softmasking-genome)
 * [Genome Statistical Assessment](#genome-statistical-assessment)
 * [Preparing TSA Files](#preparing-tsa-files)
+* [Short Read Alignment](#short-read-alignment)


 #### Gathering Data
@@ -43,8 +44,9 @@ splitfasta.sh | mask | Reduce the running time for repeatmasker by splitting the
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 wholegenomebusco.sh | assembly stats | Use [BUSCO](https://busco.ezlab.org/) to assess genome completeness | Length-filtered/softmasked genome assembly & appropriate single-copy ortholog dataset | Genome completeness benchmark results including predicted genes, alignment results & statistics | 
+quast.sh | assembly stats | Uses QUAST to assess quality of genome assemblies | Genome before filtering/softmasking & Genome after filtering/softmasking | Assembly statistics for both inputs |  

-#### Preparing TSA Files
+#### Preparing TSA
 Processing the TSA files from NCBI to be used as evidence for genome annotation tool was accomplished by frame-selecting using [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) and clustering with [USearch (v9.0)](https://www.drive5.com/usearch/manual9/).    

 Name | Step | Purpose | Input | Expected Output | Notes
@@ -52,6 +54,17 @@ Name | Step | Purpose | Input | Expected Output | Notes
 frameSelect.sh | TSA Prep | Uses [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) to identify coding regions in the transcript assemblies and translate into peptide sequences | TSA fasta file | BED, GFF3, CDS (nt coding sequence) & peptide files representing recovered coding regions |  
 usearch.sh | TSA Prep | Uses [USearch v9.0](https://www.drive5.com/usearch/manual9/) to cluster multiple frame-selected TSAs (**T**ranscriptome **S**hotgun **A**ssembly) by sequence homology into a consensus transcriptome | A single fasta made of concatenated frame-selected TSAs | Clustered reference transcriptome |   

+#### Evidence Alignment
+Short-read and TSA evidence were aligned to genome assemblies using [HISAT2](https://ccb.jhu.edu/software/hisat2/manual.shtml) and [GMAP](http://research-pub.gene.com/gmap/src/README), respectively. Before alignment, short-reads evidence was trimmed QC'd using sickle and [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 
+Name | Step | Purpose | Input | Expected Output | Notes
+---- | ---- | ------- | ----- | --------------- | -----
+fastqc.sh | short-read QC | Uses FastQC to assess read quality | fastq files from short-read libraries | statistics on read quality in HTML |  
+sickle.sh | short-read QC | Uses Sickle to trim barcodes & adapters sequences and remove low quality reads |  raw fastq files for short-read libraries | trimmed fastq files |  
+hisatBuild.sh | short-read align | Builds indices to be used by HISAT2 | Length filtered and softmasked genome in fasta format | Set of index files |  
+hisat.sh | short-read align | Runs HISAT2 short-read aligner | Path to directory contain index built using hisatBuild.sh & path to trimmed reads data | read alignments in SAM format |  
+convert.sh | short-read align | Uses [samtools](http://samtools.sourceforge.net/) to convert SAM files to the binary, BAM format | sam output of from running hisat.sh | BAM files of short-read alignments |  
+sort.sh | short-read align | uses samtools to sort BAM files, a prerequisite for merging | unsorted BAM files | sorted BAM files |  
+merge.sh | short-read align | merges sorted alignments from each short-read library into a single BAM file. | BAM files from each alignment | A single, merged BAM file |