Update Scripts Descriptions authored by Alex Trouern-Trend's avatar Alex Trouern-Trend
...@@ -6,11 +6,10 @@ The scripts used in the processing of the supporting genomic and transcriptomic ...@@ -6,11 +6,10 @@ The scripts used in the processing of the supporting genomic and transcriptomic
Table of Contents Table of Contents
* [Gathering Data](#gathering-data) * [Gathering Data](#gathering-data)
* [Assembly Filtering](#assembly-filtering) * [Assembly Filtering](#assembly-filtering)
* [TSA Prep](#preparing-tsa-files)
* [Softmasking Genome](#softmasking-genome) * [Softmasking Genome](#softmasking-genome)
* [Genome Statistical Assessment](#genome-statistical-assessment) * [Genome Statistical Assessment](#genome-statistical-assessment)
* [Preparing TSA Files](#preparing-tsa-files) * [Preparing TSA Files](#preparing-tsa-files)
* [Short Read Alignment](#short-read-alignment) * [Evidence Alignment](#evidence-alignment)
#### Gathering Data #### Gathering Data
...@@ -46,7 +45,7 @@ Name | Step | Purpose | Input | Expected Output | Notes ...@@ -46,7 +45,7 @@ Name | Step | Purpose | Input | Expected Output | Notes
wholegenomebusco.sh | assembly stats | Use [BUSCO](https://busco.ezlab.org/) to assess genome completeness | Length-filtered/softmasked genome assembly & appropriate single-copy ortholog dataset | Genome completeness benchmark results including predicted genes, alignment results & statistics | wholegenomebusco.sh | assembly stats | Use [BUSCO](https://busco.ezlab.org/) to assess genome completeness | Length-filtered/softmasked genome assembly & appropriate single-copy ortholog dataset | Genome completeness benchmark results including predicted genes, alignment results & statistics |
quast.sh | assembly stats | Uses QUAST to assess quality of genome assemblies | Genome before filtering/softmasking & Genome after filtering/softmasking | Assembly statistics for both inputs | quast.sh | assembly stats | Uses QUAST to assess quality of genome assemblies | Genome before filtering/softmasking & Genome after filtering/softmasking | Assembly statistics for both inputs |
#### Preparing TSA #### Preparing TSA Files
Processing the TSA files from NCBI to be used as evidence for genome annotation tool was accomplished by frame-selecting using [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) and clustering with [USearch (v9.0)](https://www.drive5.com/usearch/manual9/). Processing the TSA files from NCBI to be used as evidence for genome annotation tool was accomplished by frame-selecting using [TransDecoder](https://github.com/TransDecoder/TransDecoder/wiki) and clustering with [USearch (v9.0)](https://www.drive5.com/usearch/manual9/).
Name | Step | Purpose | Input | Expected Output | Notes Name | Step | Purpose | Input | Expected Output | Notes
...@@ -56,6 +55,7 @@ usearch.sh | TSA Prep | Uses [USearch v9.0](https://www.drive5.com/usearch/manua ...@@ -56,6 +55,7 @@ usearch.sh | TSA Prep | Uses [USearch v9.0](https://www.drive5.com/usearch/manua
#### Evidence Alignment #### Evidence Alignment
Short-read and TSA evidence were aligned to genome assemblies using [HISAT2](https://ccb.jhu.edu/software/hisat2/manual.shtml) and [GMAP](http://research-pub.gene.com/gmap/src/README), respectively. Before alignment, short-reads evidence was trimmed QC'd using sickle and [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Short-read and TSA evidence were aligned to genome assemblies using [HISAT2](https://ccb.jhu.edu/software/hisat2/manual.shtml) and [GMAP](http://research-pub.gene.com/gmap/src/README), respectively. Before alignment, short-reads evidence was trimmed QC'd using sickle and [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Name | Step | Purpose | Input | Expected Output | Notes Name | Step | Purpose | Input | Expected Output | Notes
---- | ---- | ------- | ----- | --------------- | ----- ---- | ---- | ------- | ----- | --------------- | -----
fastqc.sh | short-read QC | Uses FastQC to assess read quality | fastq files from short-read libraries | statistics on read quality in HTML | fastqc.sh | short-read QC | Uses FastQC to assess read quality | fastq files from short-read libraries | statistics on read quality in HTML |
... ...
......