Q

QCumber

Name Last Update
lib Loading commit data...
.gitignore Loading commit data...
QCumber.py Loading commit data...
barplot.R Loading commit data...
batch_report.html Loading commit data...
boxplot.R Loading commit data...
classes.py Loading commit data...
config.txt Loading commit data...
helper.py Loading commit data...
license.txt Loading commit data...
parameter.txt Loading commit data...
readme.md Loading commit data...
report.tex Loading commit data...
sav.R Loading commit data...

Introduction

QCumber is a tool for quality control and exploration of NGS data. All steps can be skipped if required. The workflow is as follows:

  • extract information from Sequence Analysis Viewer
  • Quality control with FastQC
  • Trim Reads with Trimmomatic
  • run FastQC and retrim if necessary
  • Quality control of trimmed reads with FastQC
  • Map reads against reference using bowtie2
  • Classify reads with Kraken

Dependencies

This tool was implemented in python and needs Python 3.4 or higher. For plotting and SAV extraction R (3.0.2) is required. Furhermore, FastQC (>v0.10.1), bowtie2 (> 2.2.3) and Kraken (0.10.5) are required.

Further packages via apt-get install:

  • Python3-pip
  • libfreetype6-dev
  • r-cran-quantreg
  • r-bioc-savr

Packages via pip3 install:

  • Jinja2
  • matplotlib

R packages:

  • ggplot2
  • savR
  • jsonlite

To change tool or adapter path, change config.txt.


Usage

python3 QCumber.py -i <input> -technology <Illumina/IonTorrent> <option(s)>

Input parameter:

-i, -input      sample folder/file. If Illumina folder, files has to match pattern <Sample name>_<lane>_<R1/R2>_<number>. 
                Eg. Sample_12_345_R1_001.fastq. Otherwise use -1,-2
-1 , -2         alternatively to -i: filename. Must not match Illumina names.
-adapter        adapter sequence (TruSeq2-PE, TruSeq2-SE, TruSeq3-PE, TruSeq3-SE, TruSeq3-PE-2, NexteraPE-PE). Required for Illumina.

Options: -technology sequencing technology (Illumina/IonTorrent). Use Illumina if files are fastq -output output folder, default: input folder -reference reference file -threads number of threads

-sav                    Sequence Analysis Viewer folder. Requires Interop folder, RunInfo.xml and RunParameter.xml
-rename                 Rename sample names in report. TSV File with two columns: <old sample name> <new sample name>
-parameters             Use own standard parameter.
-trimOption             Override standard trimming option. E.g. MAXINFO:<target length>:<strictness> | SLIDINGWINDOW:<window size>:<required quality>.
                        default: SLIDINGWINDOW:4:15
-trimBetter             Optimize trimming parameter using 'Per sequence base content' from fastqc
-trimBetter_threshold   Threshold for 'Per sequence base content' fluctuation. Default:0.15
-forAssembly            Trim parameter are optimized for assemblies (trim more aggressive).
-forMapping             Trim parameter are optimized for mapping(allow more errors).
-minlen                 Minlen parameter for Trimmomatic. Default:50
-palindrome             palindrome parameter used in Trimmomatic (use 30 or 1000 for further analysis). Default: 30
-gz                     Output trimmed files as .gz

-db                     Kraken database
-nokraken               skip Kraken
-index                  Bowtie2 index if available
-save_mapping           Save sam files
-nomapping              skip mapping
-notrimming             skip trimming

-version                Get version

Output:

  • QCResult
    • Report
      • PDF report per sample
      • HTML report for entire project
      • src
        • img
          • Summary images
    • FastQC
    • Trimmed
      • FastQC

Program Description

This project consists of 6 files:

  • QCumber.py main script for running complete pipeline
  • classes.py script containing classes
  • helper.py small helper functions
  • report.tex Template for sample reports
  • batch_report.html Template for batch report
  • config.txt path to tools and adapter file
  • boxplot.R boxplots of fastqc output for batch report
  • barplot.R barplots of read statistics
  • parameter.txt default parameter for trimming, set pattern for Illumina names,..

Example

  1. Simple usage for Illumina:

    python3 QCumber.py -1 sample_R1.fastq -2 sample_R2.fastq -technology Illumina -adapter NexteraPE-PE -r myReference.fasta
  2. Entering a project:

    python3 QCumber.py -input myProjectFolder/ -technology IonTorrent -r myReference.fasta

License

Copyright (C) 2017 Vivi Hue-Trang Lieu

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License, version 3 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.