Changes

Alex Trouern-Trend · 81a9cd97
--- a/Scripts-Descriptions.md
+++ b/Scripts-Descriptions.md
@@ -14,6 +14,7 @@ Table of Contents
 #### Gathering Data
 The genomic and transcriptomic data used for the experiment was sourced from NCBI for most of the species used. Other data were unpublished and sourced from collaborators.   
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 fetchSRA.sh | gather data |  Gather data from multiple NCBI SRRs (**S**equence **R**ead archive **R**un accessions) | Text file containing the specific SRR runs that you want to download | raw reads in .fastq format | the option `--split-files` in line 22 must be removed if gathered data is not paired
@@ -21,12 +22,14 @@ validateSRA.sh | gather data |  Validates that the correct files have been downl
 #### Assembly Filtering  
 Scaffolds that were less than 500bp were removed from the assemblies. 
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 filtersubmit.sh | filter | run filterLen.py with the correct module and input settings to remove small scaffolds from genome assembly | genome assembly of interest | genome assembly excluding scaffolds < 500 bp | 
 #### Softmasking Genome
 The repetitive regions of the genome were identified and softmasked using [RepeatModeler](http://www.repeatmasker.org/RepeatModeler/) and [RepeatMasker](http://www.repeatmasker.org/webrepeatmaskerhelp.html), respectively.
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 repeatModeler.sh | mask | Use [RepeatModeler](http://www.repeatmasker.org/RepeatModeler/) to generate a de novo repeats library for a genome | filtered genome assembly in fasta format | repeats library suffixed conseni.fa.classified |   
@@ -36,12 +39,14 @@ splitfasta.sh | mask | Reduce the running time for repeatmasker by splitting the
 #### Genome Statistical Assessment
 [QUAST](http://quast.bioinf.spbau.ru/manual.html) and [BUSCO](https://busco.ezlab.org/) were used to assess the assembly quality and completeness.
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 wholegenomebusco.sh | assembly stats | Use [BUSCO](https://busco.ezlab.org/) to assess genome completeness | Length-filtered/softmasked genome assembly & appropriate single-copy ortholog dataset | Genome completeness benchmark results including predicted genes, alignment results & statistics | 
 #### Preparing TSA Files
 Processing the TSA files from NCBI to be used as evidence for genome annotation tool was accomplished by frame-selecting using [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) and clustering with [USearch (v9.0)](https://www.drive5.com/usearch/manual9/)    
 Name | Step | Purpose | Input | Expected Output | Notes
 ---- | ---- | ------- | ----- | --------------- | -----
 frameSelect.sh | TSA Prep | Uses [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) to identify coding regions in the transcript assemblies and translate into peptide sequences | TSA fasta file | BED, GFF3, CDS (nt coding sequence) & peptide files representing recovered coding regions |