Q

QCumber


Introduction

QCumber is a tool for quality control and exploration of NGS data. All steps can be skipped if required. The workflow is as follows:

  • extract information from Sequence Analysis Viewer
  • Quality control with FastQC
  • Trim Reads with Trimmomatic
  • run FastQC and retrim if necessary
  • Quality control of trimmed reads with FastQC
  • Map reads against reference using bowtie2
  • Classify reads with Kraken

Dependencies

This tool was implemented in python and needs Python 3.4 or higher. For plotting and SAV extraction R (3.0.2) is required. Furhermore, FastQC (>v0.10.1), bowtie2 (> 2.2.3) and Kraken (0.10.5) are required.

Further packages via apt-get install:

  • Python3-pip
  • libfreetype6-dev
  • r-cran-quantreg
  • r-bioc-savr

Packages via pip3 install:

  • Jinja2
  • matplotlib

R packages:

  • ggplot2
  • savR
  • jsonlite

To change tool or adapter path, change config.txt.


Usage

python3 QCumber.py -i <input> -technology <Illumina/IonTorrent> <option(s)>

Input parameter:

-i, -input      sample folder/file. If Illumina folder, files has to match pattern <Sample name>_<lane>_<R1/R2>_<number>. 
                Eg. Sample_12_345_R1_001.fastq. Otherwise use -1,-2
-1 , -2         alternatively to -i: filename. Must not match Illumina names.
-adapter        adapter sequence (TruSeq2-PE, TruSeq2-SE, TruSeq3-PE, TruSeq3-SE, TruSeq3-PE-2, NexteraPE-PE). Required for Illumina.

Options: -technology sequencing technology (Illumina/IonTorrent). Use Illumina if files are fastq -output output folder, default: input folder -reference reference file -threads number of threads

-sav                    Sequence Analysis Viewer folder. Requires Interop folder, RunInfo.xml and RunParameter.xml
-rename                 Rename sample names in report. TSV File with two columns: <old sample name> <new sample name>
-parameters             Use own standard parameter.
-trimOption             Override standard trimming option. E.g. MAXINFO:<target length>:<strictness> | SLIDINGWINDOW:<window size>:<required quality>.
                        default: SLIDINGWINDOW:4:15
-trimBetter             Optimize trimming parameter using 'Per sequence base content' from fastqc
-trimBetter_threshold   Threshold for 'Per sequence base content' fluctuation. Default:0.15
-forAssembly            Trim parameter are optimized for assemblies (trim more aggressive).
-forMapping             Trim parameter are optimized for mapping(allow more errors).
-minlen                 Minlen parameter for Trimmomatic. Default:50
-palindrome             palindrome parameter used in Trimmomatic (use 30 or 1000 for further analysis). Default: 30
-gz                     Output trimmed files as .gz

-db                     Kraken database
-nokraken               skip Kraken
-index                  Bowtie2 index if available
-save_mapping           Save sam files
-nomapping              skip mapping
-notrimming             skip trimming

-version                Get version

Output:

  • QCResult
    • Report
      • PDF report per sample
      • HTML report for entire project
      • src
        • img
          • Summary images
    • FastQC
    • Trimmed
      • FastQC

Program Description

This project consists of 6 files:

  • QCumber.py main script for running complete pipeline
  • classes.py script containing classes
  • helper.py small helper functions
  • report.tex Template for sample reports
  • batch_report.html Template for batch report
  • config.txt path to tools and adapter file
  • boxplot.R boxplots of fastqc output for batch report
  • barplot.R barplots of read statistics
  • parameter.txt default parameter for trimming, set pattern for Illumina names,..

Example

  1. Simple usage for Illumina:

    python3 QCumber.py -1 sample_R1.fastq -2 sample_R2.fastq -technology Illumina -adapter NexteraPE-PE -r myReference.fasta
    
  2. Entering a project:

    python3 QCumber.py -input myProjectFolder/ -technology IonTorrent -r myReference.fasta
    

License

Copyright (C) 2017 Vivi Hue-Trang Lieu

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License, version 3 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.