Commit 25d2edd8 authored by Vedran Miletić's avatar Vedran Miletić

Assimilate the content from

parent 95010ce3
.. _getting-started-guide:
Getting started guide
......@@ -6,6 +8,93 @@ Getting started guide
:depth: 2
.. _quick-and-dirty-installation:
Quick and dirty installation
In this section you will have short instructions to make a typical installation of rDock.
To get the full documentation of all rDock software package and methods, please go to the :ref:`Full Documentation <full-documentation>` webpage.
Moreover, you can also check the following information:
* :ref:`Getting Started <getting-started>`: installation and validation instructions for first-time users.
* :ref:`Validation Sets <validation-sets>`: instructions and examples for re-running the validation sets we have carried out.
* :ref:`Calculating ROC Curves <calculating-roc-curves>`: tutorial for generating ROC Curves and other statistics after running rDock docking jobs.
Installation in 3 steps
We have been able to compile rDock in the following Linux systems:
* CentOS 5.5 64 bits
* openSUSE 11.3 32 and 64 bits
* openSUSE 12.3 32 and 64 bits
* openSUSE 13.1 32 and 64 bits
* Ubuntu 12.04 32 and 64 bits
Step 1
First of all, you will need to install several packages before compiling and running rDock:
* gcc and g++ compilers version > 3.3
* make
* cppunit and cppunit-devel
* popt and popt-devel
.. note::
**For Ubuntu users:**
If you are trying to use rDock in Ubuntu, please note that csh shell is not included in a default installation. We recommend to install csh in case some error arises (``sudo apt-get install csh``), even with all the above-stated dependencies installed.
Afterwards, download the source code compressed file or get it by SVN in :ref:`Downloads <download>` section.
Step 2
Then, run the following commands:
.. code-block:: bash
tar -xvzf rDock_2013.1_src.tar.gz
cd rDock_2013.1_src/build/
and, for 32 bits computers:
.. code-block:: bash
make linux-g++
for 64 bits computers:
.. code-block:: bash
make linux-g++-64
Step 3
After compiling successfully, type the following command to make a test and check that your compiled version works good and the results are correct.
.. code-block:: bash
make test
If the test has succeed, you are done, enjoy using rDock!
Otherwise, please check your dependencies and all the previous commands or go to :ref:`Support Section <support>` to ask for help.
Just as a concluding remark, don’t forget to set the necessary environmental variables for running rDock in the command line (for example, in bash shell):
.. code-block:: bash
export RBT_ROOT=/path/to/rDock/installation/
export PATH=$PATH:$RBT_ROOT/bin
......@@ -17,8 +106,6 @@ The major components of the platform now include fast intermolecular scoring fun
This introductory guide is aimed at new users of rDock. It describes the minimal set of steps required to build rDock from the source code distribution, and to run one of the automated validation experiments provided in the test suite distribution. The instructions assume that you are comfortable with simple Linux command line administration tasks, and with building Linux application from make files. Once you are familiar with these steps you should proceed to the User and Reference Guide for more detailed documentation on the usage of rDock.
.. [RiboDock2004] Validation of an empirical RNA-ligand scoring function for fast flexible docking using RiboDock, SD Morley and M Afshar, J. Comput.-Aided Mol. Des., 18 (2004) 189-208.
.. rDock documentation master file, created by
sphinx-quickstart on Fri Apr 26 12:07:51 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _rdock-documentation:
Welcome to rDock's documentation!
rDock: a Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids
rDock is a fast and versatile **open-source docking program** that can be used to dock **small molecules** against **proteins** and **nucleic acids**. It is designed for High Throughput Virtual Screening (HTVS) campaigns and binding mode prediction studies.
rDock is mainly written in C++ and accessory scripts and programs are written in C++, Perl or Python languages.
The full rDock software package requires **less than 50 MB** of hard disk space and it is compilable (at this moment, **only**) in **all Linux computers**.
Thanks to its design and implementation [rDock2014]_, it can be installed on a computation cluster and deployed on an **unlimited number of CPUs**, allowing HTVS campaigns to be carried out in a **matter of days**.
Besides its main Docking program, the rDock software package also provides a set of tools and scripts to facilitate **preparation** of the input files and **post-processing** and **analysis** of results.
.. _about:
.. figure:: _images/dock1.jpg
The above image illustrates the first binding mode solution for ASTEX system 1hwi, with an RMSD of 0.88 Å.
Docking preparation
Define cavities using **known binders** or with user-supplied **3D coordinates**. Allow -OH and -NH2 receptor side chains to rotate. Add explicit solvent molecules and structural waters. Supply pharmacophoric restraints as a bias to **guide docking**.
Pre-Processing of input files
Define common ligand structure for performing **tethered docking** (requires Open Babel Python bindings). Sort, filter or split ligand files for facilitating **parallelization**. Find **HTVS protocol** for optimizing calculation time. Pre-calculate grids to decrease subsequent calculation times.
Post-Processing and analysis of results
Summarize results in a tabular format. Sort, filter, merge or split results files. Calculate **RMSD** with a reference structure taking into account internal symmetries (requires Open Babel Python bindings).
Binding mode prediction
Predict how a ligand will bind to a given molecule. The ASTEX non-redundant test set for proteins and DOCK and rDock test sets for RNA have been used for validating and comparing rDock with other programs.
Run for million of compounds in short time by exploiting the capabilities of computer calculation farms. Ease of **parallelization** in relatively unlimited CPUs to optimize HTVS running times. The DUD set has been used for validating rDock and comparing its performance to other reference docking programs.
.. figure:: _images/dock2.jpg
In red mesh, definition of the cavity obtained by execution of ``rbcavity`` program.
The rDock program was developed from 1998 to 2006 by the software team at RiboTargets (subsequently Vernalis (R&D) Ltd) [RiboDock2004]_. In 2006, the software was licensed to the University of York for maintenance and distribution.
In 2012, Vernalis and the University of York agreed to release the program as open-source software [rDock2014]_. This version is developed with support from the University of Barcelona – ` <>`__.
rDock is licensed under GNU LGPL version 3.0.
.. toctree::
:maxdepth: 2
If you are using rDock in your research, please cite:
.. [rDock2014] Ruiz-Carmona, S., Alvarez-Garcia, D., Foloppe, N., Garmendia-Doval, A. B., Juhos S., et al. (2014) rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids. PLoS Comput Biol 10(4): e1003571. `doi:10.1371/journal.pcbi.1003571 <>`__
Former software reference provided for completeness:
.. [RiboDock2004] Morley, S. D. and Afshar, M. (2004) Validation of an empirical RNA-ligand scoring function for fast flexible docking using RiboDock®. J Comput Aided Mol Des, 18: 189–208. `doi:10.1023/B:JCAM.0000035199.48747.1e <>`__
.. _download:
Please visit rDock sourceforge page for most up to date releases.
* `Download Files <>`__
* `Get Using SVN <>`__
.. _getting-started:
Getting started
In this section, you have the documentation with installation and validation instructions for first-time users.
You can read the documentation in **HTML format** in this same page, or you can download it in **PDF format** following :ref:`this link <getting-started-guide>`.
To continue with a short validation experiment (also found in the :ref:`Getting Started PDF file <getting-started-guide>`), please visit the following page: :ref:`Validation Sets <validation-sets>`.
.. _full-documenation:
In this section you can find the documentation containing full explanation of all rDock software package and features in HTML and PDF format.
For the moment, there is only access to the :ref:`PDF format documentation <reference-guide>`.
For installation details and first-users instructions, please visit :ref:`Installation <quick-and-dirty-installation>` and :ref:`Getting Started <getting-started>` sections.
.. toctree::
:maxdepth: 2
:caption: Contents:
.. _support:
If you are having some trouble regarding usage, compilation, development or anything else, you can use different options to ask for support.
Mailing lists
If you are having any kind of trouble, you have any questions or anything related to general usage of the program please search and use our `mailing lists <>`__.
Issue tracker
Mostly for developers and code-related problems. If you find any bug, e.g., please go to `tickets section <>`__ in rDock sourceforge website.
Indices and tables
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates the
terms and conditions of version 3 of the GNU General Public License,
supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the
GNU General Public License.
"The Library" refers to a covered work governed by this License, other
than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
a) under this License, provided that you make a good faith effort
to ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from a
header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this
license document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that, taken
together, effectively do not restrict modification of the portions of
the Library contained in the Combined Work and reverse engineering for
debugging such modifications, if you also do each of the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this
license document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of
this License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with
the Library. A suitable mechanism is one that (a) uses at run
time a copy of the Library already present on the user's
computer system, and (b) will operate properly with a modified
version of the Library that is interface-compatible with the
Linked Version.
e) Provide Installation Information, but only if you would
otherwise be required to provide such information under section 6
of the GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the Application
with a modified version of the Linked Version. (If you use option
4d0, the Installation Information must accompany the Minimal
Corresponding Source and Corresponding Application Code. If you
use option 4d1, you must provide the Installation Information in
the manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the Library
side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities, conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
as you received it specifies that a certain numbered version of the
GNU Lesser General Public License "or any later version" applies to
it, you have the option of following the terms and conditions either
of that published version or of any later version published by the
Free Software Foundation. If the Library as you received it does not
specify a version number of the GNU Lesser General Public License, you
may choose any version of the GNU Lesser General Public License ever
published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
.. _calculating-roc-curves:
Calculating ROC curves
(Original entry published in `CBDD Research Group Blog <>`__.)
Here you will find a a short tutorial about how to generate Receiver operating characteristic (ROC) curves and other statistics after running rDock molecular docking (for other programs such as Vina or Glide, just a little modification on the way ``dataforR_uq.txt`` file is interpreted will make it work, see below).
I assume all of you are familiar with what ROC curves are, what are they for and how they are made.
Just in case, a very brief summary would be:
* `ROC curves <>`__ are graphic representations of the relation existing between the sensibility and the specificity of a test. It is generated by plotting the fraction of true positives out of the total actual positives versus the fraction of false positives out of the total actual negatives.
* In our case, we will use it for checking whether a docking program is able to select active ligands with respect to inactive ligands (decoys) and whether it is able to select these active ligands in the top % of a ranked database.
* R Library `ROCR <>`__ is mandatory (try with command ``install.packages("ROCR")`` in R before downloading from source).
The example selected for this tutorial is a system from the DUD benchmark set, "hivpr" or "hiv protease".
These are the files you will need (all can be downloaded in this `Dropbox shared folder <>`__):
* List of active ligands (``ligands.txt``)
* List of inactive ligands (``decoys.txt``)
* Output file with the docked poses of each ligand with the corresponding docking scores (````)
* R script with all the R commands in this tutorial (``ROC_curves.R``)
Before getting into R, the resulted docked poses have to be filtered out for only having the best pose for each ligand (the smallest score – or highest in negative value). To do so run:
.. code-block:: bash
sdsort -n -s -fSCORE | sdfilter -f'$_COUNT == 1' >
#sdsort with -n and -s flags will sort internally each ligand by increasing score and sdfilter will get only the first entry of each ligand.
sdreport -t | awk '{print $2,$3,$4,$5,$6,$7}' > dataforR_uq.txt
#sdreport will print all the scores of the output in a tabular format and, with command awk, we will format the results
.. note::
``sdsort`` and ``sdreport`` are really useful tools for managing sd formatted compound collections. They are very user-friendly and free to download. They are provided along with rDock software in :ref:`rDock website <rdock-documentation>`. Go to :ref:`Download <download>` section for downloading rDock.
This ``dataforR_uq.txt`` (also in the Dropbox folder) file must contain one entry per ligand with the docked scores (what R will use to rank and plot the ROC curves).
R Commands for generating ROC Curves
Then, run the following commands in R for plotting the ROC curves:
.. code-block:: r
#load ROCR
#load ligands and decoys
Which will give us the following plot:
.. image:: _images/hivpr_Rinter_ROC.jpg
Afterwards, other useful statistics such as AUC or Enrichment factors can also be calculated:
.. code-block:: r
#AUC (area under the curve)
.. code-block:: r
.. code-block:: r
#Enrichment Factors
EF_rdock 0.01)[1]]
EF_rdock_20 0.2)[1]]
cat("Enrichment Factor top1%:\n")
.. code-block:: r
Enrichment Factor top1%:
.. code-block:: r
cat("Enrichment Factor top20%:\n")
.. code-block:: r
Enrichment Factor top20%:
Moreover, a good analysis of these curves is to re-plot them in semilogarithmic scale (x axis in logarithmic scale). This way, one can focus on the early enrichment of the database and have a more detailed view of the selected actives in the top % of all the ligands.
.. code-block:: r
[email protected][[1]]
rdockforsemilog[rdockforsemilog < 0.0005]=0.0005
plot(rdockforsemilog,[email protected][[1]],type="l",xlab="False Positive Rate", ylab="True Positive Rate",xaxt="n", log="x", col="blue",main="hivpr - Semilog ROC Curves")
axis(1, c(0,0.001,0.01,0.1,1))
Obtaining the following semi-logarithmic ROC curves:
.. image:: _images/hivpr_semilog_ROC.jpg
Docking in 3 steps
You will find in this page a short tutorial for running rDock.
It has been divided in 3 steps:
1. System definition
2. Cavity generation
3. Docking
Step 1: System definition
First of all, we need to define the system.
Below these lines you have an example for a DUD system of a typical prm file (See :ref:`Documentation <full-documenation>` for more information):
.. code-block:: python
RECEPTOR_FILE gart_rdock.mol2
SITE_MAPPER RbtLigandSiteMapper
You will need this generated ``.prm`` file, a receptor structure mol2 file (``gart_rdock.mol2``) and a ligand file in the cavity (````) for going to next stage.
.. note::
The receptor ``.mol2`` file must be preparated (protonated, charged, etc.) prior to this stage. The program chosen to do so is up to the user. As a suggestion, we usually work with MOE and/or Maestro.
Step 2: Cavity generation
Once the files are ready, a simple command will generate the cavity:
.. code-block:: bash
rbcavity -was -d -r <PRMFILE>
With the ``-d`` flag a grid ``.grd`` file is generated. This file can be visualized in a molecular viewer to check the generated cavity.
For example, in PyMOL (after loading by: ``pymol <RECEPTOR>.mol2 <LIGAND>.sd <GRID>.grd``), write the following command in the console:
.. code-block:: python
isomesh cavity, <GRID>.grd, 0.99
Step 3: Docking
Once the cavity is defined and generated, a 50 runs-per-ligand rDock job can be run straightforwardly with the following command:
.. note::
The ``.prm``` file, receptor, reference ligand and ``.as`` cavity file must be in the working directory or pointed by the environmental variable ``RBT_HOME``.
.. code-block:: bash
rbdock -i <INPUT>.sd -o <OUTPUT> -r <PRMFILE> -p dock.prm -n 50
.. _tutorials:
The following tutorials describe common use cases of rDock.
.. toctree::
:maxdepth: 2
Multi-step protocol for HTVS
For High Throughput Virtual Screening (HTVS) applications, where computing performance is important, the recommended rDock protocol is to limit the search space (i.e. rigid receptor), apply the grid-based scoring function and/or to use a multi-step protocol to stop sampling of poor scorers as soon as possible.
Using a Multi-Step protocol for the DUD system COMT, the computational time can be reduced by 7.5-fold without affecting performance by:
1. Running 5 docking runs for all ligands;
2. ligands achieving a score of -22 or lower run 10 further runs;
3. for those ligands achieving a score of -25 or lower, continue up to 50 runs.
The optimal protocol is specific for each particular system and parameter-set, but can be identified with a purpose-built script (see the Manual in :ref:`Full Documentation <full-documentation>`, section ``rbhtfinder``).
Here you will find a tutorial to show you how to create and run a Multi-Step Protocol for a HTVS campaign.
Step 1: Create the multi-step protocol
These are the instructions for running rbhtfinder:
.. code-block:: bash
1st) exhaustive docking of a small representative part of the
whole library.
2nd) Store the result of sdreport -t over that exhaustive dock.
in file that will be the input of this
3rd) rbhtfinder <sdreport_file> <output_file> <thr1max> <thr1min> <ns1> <ns2>
<ns1> and <ns2> are the number of steps in stage 1 and in
stage 2. If not present, the default values are 5 and 15
<thrmax> and <thrmin> setup the range of thresholds that will
be simulated in stage 1. The threshold of stage 2 depends
on the value of the threshold of stage 1.
An input of -22 -24 will try protocols:
5 -22 15 -27
5 -22 15 -28
5 -22 15 -29
5 -23 15 -28
5 -23 15 -29
5 -23 15 -30
5 -24 15 -29
5 -24 15 -30
5 -24 15 -31
Output of the program is a 7 column values. First column
represents the time. This is a percentage of the time it
would take to do the docking in exhaustive mode, i.e.
docking each ligand 100 times. Anything
above 12 is too long.
Second column is the first percentage. Percentage of
ligands that pass the first stage.
Third column is the second percentage. Percentage of
ligands that pass the second stage.
The four last columns represent the protocol.
All the protocols tried are written at the end.
The ones for which time is less than 12%, perc1 is
less than 30% and perc2 is less than 5% but bigger than 1%
will have a series of *** after, to indicate they are good choices
WARNING! This is a simulation based in a small set.
The numbers are an indication, not factual values.
Step 1, substep 1: Exhaustive docking
Hence, as stated, the first step is to run an **exhaustive docking** of a representative part of the whole desired library to dock.
For rDock, exhaustive docking means doing **100 runs** for each ligand, whereas standard docking means 50 runs for each ligand:
.. code-block:: bash
rbdock -i -o OUTPUT -r PRMFILE.prm -p dock.prm -n 100
Step 1, substep 2: ``sdreport`` summary
Once the exhaustive docking has finished, the results have to be saved in a **single file** and the output of the script ``sdreport -t`` will be used as **input for** ``rbhtfinder``:
.. code-block:: bash
sdreport -t > sdreport_results.txt
Step 1, substep 3: ``rbhtfinder`` script
The **last step** is to run the ``rbhtfinder`` script (download :download:`sdreport_results.txt <_downloads/>` for testing):
.. code-block:: bash
rbhtfinder sdreport_results.txt htvs_protocol.txt -10 -20 7 25
Which will result in a file called ``htvs_protocol.txt``.
The parameters are explained in the script instructions. They are not always the same and as they depend on the system, you will probably have to play a little with different values in order to **obtain good parameters sets** (marked with ``***`` in the output).
This will happen when **time** is less than 12%, **perc1** (number of ligands that pass the first filter) is less than 30% and **perc2** (number of ligands that pass the second filter) is less than 5% but bigger than 1%.
Step 2: Run rDock with the Multi-Step Protocol
The script finished with two good parameters sets:
.. code-block:: python
11.928, 27.461, 3.207, 7, -12, 25, -17 ***
10.508, 18.773, 1.511, 7, -13, 25, -18 ***
These parameters have to be adapted to a **file** with the HTVS **protocol format** that rDock understands.
A **template file** looks as follows (``THR1``, ``THR2``, ``N1`` and ``N2`` are the parameters found above):
.. code-block:: python
if – <THR1> SCORE.INTER 1.0 if – SCORE.NRUNS <N1-1> 0.0 -1.0,
if – <THR2> SCORE.INTER 1.0 if – SCORE.NRUNS <N2-1> 0.0 -1.0,
if – SCORE.NRUNS 49 0.0 -1.0,
It is divided in 2 sections, **Running Filters** and **Writing Filters** (defined by the lines with one number).