Commit 37abede7 authored by bue's avatar bue
Browse files

man : smoke evolution.

parent 3f27b67d
Loading
Loading
Loading
Loading
+234 KiB
Loading image diff...
+50 −26
Original line number Diff line number Diff line
# Discussion

## Why Annot?
<!--
For a discussion why annot was developed please read our [publication]().
-->

Our overarching goal was to create a database to support the collection and
access of controlled, structured experimental metadata to meet the needs of both
computational and experimental scientists.

The common solution to this in biological research labs is to employ spreadsheets. 
While these benefit from being flexible and easily edited,
they are subject to errors that result from manual entry, inadvertent auto-formatting, and version drift. 
Annot offers a robust solution to annotate - using controlled vocabulary - samples, reagents, and experimental details
for established assays where multiple staff are involved.
While Annot was written with an informatics agnostic end-user in mind,
full system administration requires basic skills in Linux, Python3, and Django,
as well as basic knowledge of relational databases.
Because of the cost required to populate Annot with detailed sample and reagent annotation, 
it is most appropriate for large-scale, high-throughput experiments.

However, a major benefit to our approach is that data generated in different experimental settings
can be integrated through a detailed description of each experimental condition
along the dimensions of sample, perturbation, and endpoint.
Moreover, the high cost of large-scale screening efforts warrants the time and effort required to adequately annotate it.
Ultimately, approaches such as this will allow data to be better leveraged and utilized to make discoveries and biological insights. 


## About Controlled Vocabulary
@@ -14,12 +37,11 @@ into annot id [INS_P01317](http://www.uniprot.org/uniprot/P01317). This approach
helps limit variability in the nomenclature.

Annot id’s are further restricted to use only alphanumeric character and
the underscore. The official ontology identifier is alway to be found behind
the last underscore.
the underscore. The official ontology identifier is found behind the last underscore.

If needed, the term part of the annot identifier can be adjusted
to a term everyone in the lab is familiar with. The ontology identifier, on the
other hand, should stay untouched. For example: we changed the official term
other hand, should not be modified. For example: we changed the official term
hyaluronic_acid_chebi16336 into HA_chebi16336. These types of changes should
always happen before the term is used for sample or reagent annotation.

@@ -27,20 +49,20 @@ In the case needed terms are missing from a particular ontology, terms can be
borrowed from an other ontology and added by clicking the `Add` button
in the particular ontology (orange colored in the GUI).

Whenever possible, we took controlled vocabulary form existing, well established ontologies.
However, there was vocabulary we could not find in an appropriate ontology.
We advocate the use of controlled vocabulary form existing, well established ontologies whenever possible.
However, some terms do not exist in established ontologies.
For example, all vocabulary from apponprovider_own.
All of these terms will have "Own" as term id, so there are
All of these terms will have "Own" as term id, so they are
easily detectable. For example: Boots_Own for the boots pharmacy.

Should it ever happen that an id from an ontology get deprecated then the
Should it ever happen that an id from an ontology becomes deprecated then the
particular term will not be deleted but in `appsabbrick` (brigth orange colored in the GUI)
in the ` Uploaded_endpoint_reagent_bricks` or `Uploaded_perturbation_reagent_bricks`
or `Uploaded_sample_bricks` table the `ok_brick` field will be set to False
(which appears as white x in a red dot in the GUI), and the`ontology_term_status`
field in the corresponding ontology (maroon colored in the GUI) will be set to False.

In annot, the controlled vocabulary origin version contains just the latest
In annot, the controlled vocabulary origin version contains the latest
information pulled form the original source. Only the backup version will store
adapted annot ids, our own added ontology terms, and deprecated ontology identifiers.
Original version and backup files can be found inside annot at `/usr/src/media/vocabulary/`.
@@ -54,20 +76,21 @@ Further reading:

## About Proteins, Protein Isoforms, and Protein Complexes

Protein ids are a bit of a special case, because some proteins have known isoforms.
For such case we introduced a additional hierarchical separation character, the pipe
Protein ids are a bit of a special case because some proteins have multiple known isoforms.
For such cases we introduced an additional hierarchical separation character, the pipe
symbol (|). For example, the canonical human insulin isoform: INS|1_P01308|1.
Please note that in UniProt, the isoforms identifiers is officially separated
Please note that in [UniProt](https://www.uniprot.org/), the isoforms identifiers is officially separated
by a dash from the protein identifier (e.g. P01308-1). Annot already uses the dash
to separate the primary key parts in the annot brick identifiers.
to separate the primary key parts in the annot brick identifiers
which is why we adopted the pipe here.

If we could not identify an exact isoform, we always chose the canonical isoform as
defined by UniProt. The boolean field isoform_explicit in the protein brick identifies
which proteins have known isoforms and which simply use the canonical form.

Since UniProt doesn't cover protein complexes (i.e. COL1, ITGA2B1 or Laminin3B32),
we used Gene Ontology cellular component identifiers, which resulted in annot ids
like COL1_go0005584, ITGA2B1_go0034666, Laminin3B32_go0061801.
Because UniProt doesn't cover protein complexes (i.e. COL1, ITGA2B1 or Laminin3B32),
we used [Gene Ontology cellular component identifiers](https://www.ebi.ac.uk/complexportal/home), 
which resulted in annot ids like COL1_go0005584, ITGA2B1_go0034666, Laminin3B32_go0061801.

The unique annot id naming conventions make it very easy to spot key details about a
protein. All details are not in the name, for example, the species the protein comes
@@ -80,26 +103,27 @@ Further reading:

## About not_yet_specified and not_available

Annot sets every empty field to not_yet_specified, regardless of if the
Annot sets every empty field to not_yet_specified, regardless of whether
information was not specified or the information was simply not available.
This avoids the common problem of empty fields often found in
simple spreadsheet metadata.
This avoids the common problem of empty fields and confusion about how to handle missing data.

A sample or reagent brick, which has a not_yet_specified field in the primary key
block, will in general not be uploadable. If however, primary key fields are
marked as not_available, then we can upload the reagent brick.
For example, if we do not have information about provider, catalog number,
or lot number of the reagent DMSO, then we would have the following descriptor:
DMSO_chebi28262-notavailable_notavailable_notavailable.

A sample or reagent, which has a not_yet_specified field in the primary key
block, will in general not be brickable. If however, primary key fields are
marked as not_available, for example for reagents we do not care about the
exact provider or catalog number or lot number like DMSO_chebi28262-notavailable_notavailable_notavailable.
The reagent will be brickable.

## Programmer Contribution

+ Elmar Bucher: main programmer.
+ Cheryl Claunch: co-programmer to bring version 4 alive.
+ Cheryl Claunch: co-programmer on version 4.
+ Derrick He: cron job backup routine implementation.
+ Dave Kilburn: manual proofreading.
+ Dave Kilburn and Laura Heiser: manual proofreading.


## Contact Information

Contact bue at https://gitlab.com/biotransistor/annot
Contact Elmar Bucher at https://gitlab.com/biotransistor/annot
or send an email to buchere at ohsu dot edu.
+30 −42
Original line number Diff line number Diff line
@@ -506,16 +506,15 @@ change from a green tick to a red cross, the next time this brick type is upload

In a similar way the [IPO](https://en.wikipedia.org/wiki/IPO_model)
input processing output paradigm can describes the structure of an information processing program,
a biological experment have to specify sample, perturbation and endpoint to be well described.
The sample can therby be regared as input, perturbations as processing and the endpoints as output.
In annot sample, perturbation and endpoint are regarded as "axis".
Below is desctribe who such axis has to be specified.

a biological experiment can be specified by sample, perturbation and endpoint description.
The samples can thereby be regarded as input, perturbations as processing and endpoints as output.
In annot assay coordinate model sample, perturbation and endpoint are represented as "axis".
Below is in short described, who such axis are specified.
Check out the Tutotal for an applied example.

#### About **axis sets**!

1. So, first one has to gather the samples, the perturbation reagents, and the endpoint reagents used in the experiment.
1. To define an axis set, one first has to gather the samples, the perturbation reagents, and the endpoint reagents used in the experiment.

    1. scroll to the cyan colored `Appacaxis` box.
    1. click the cyan `Set_of_Endpoints` and `Add` link to group together the endpoint brick used in an experiment.
@@ -530,29 +529,26 @@ because the layout files will be grouped into folders according to their major s
and the unstacked dataframe will group the columns according to the major sets.
If no dash is given, then the major and the minor set name are the same.

1. Second, the gathered samples and reagents have to be layouted.
   For this you need at least some basic python3 skills.
   You have to have python3 and the acpipe_acjson library installed on your computer.
   How you install python3 depends very much on your operating system.
   After you have installed python3 you can install the acpipe_acjson library
   with pip like this:
1. Second, the gathered samples and reagents have to be laid out.
   Python3 and the acpipe_acjson library must be installed on your computer.
   You can install the acpipe_acjson library with pip like this:
   1. `pip3 install acpipe_acjson` should do the trick.

   What follows is the description of the layouting process on a perturbation set.
   But layouting for sample and endpoint sets is done exactly the same way.
   What follows is the description of the layout process on a perturbation set.
   But layout for sample and endpoint sets is done exactly the same way.

    1. click the cyan colored `Set_of_Perturbation` link.
    1. choose the set you like to layout.
    1. choose the set you would like to layout.
    1. in the `Action` drop down list choose `Download selected set's python3 acpipe template script`
      and click `Go` to download the template file.
    1. open the template file in a [text editor](https://en.wikipedia.org/wiki/Text_editor).
      You will find python3 template code, generated based on set_name and the
      chosen bricks. Please note, the computer is not able to figure out the
      overall layout and each reagent's concentration and time.
      So, please read the template code and replace all the question-marks with meaningful values.
      chosen bricks.
      Read the template code and replace all the question-marks,
      which are place holders for wellplate layout and each reagent's concentration and reaction time,
      with meaningful values.
    1. then run `python3 acpipeTemplateCode_*set-name*.py`. This will result
      in a `acpipe_acjson-*set-name*_ac.json` file. You can have a look at the
      acjson file in any json editor, if you are interested in the structure.
      in a `acpipe_acjson-*set-name*_ac.json` file.

1. Third, upload the generated acjson file and check for consistency.
    1. on the GUI click the name from the set you downloaded the template.
@@ -565,12 +561,10 @@ If no dash is given, then the major and the minor set name are the same.


#### About **supersets**!
Superset - stored in the blue colored `App4Superset` box - are optional
and a kind of advanced topic.
Superset - stored in the blue colored `App4Superset` box - are optional.

Imagine for example you have [pipette robot](http://opentrons.com/) which
helps you to produce "random" distributed well plates out of a hand full of
reagents provided in eppendorf tubes.
helps you to produce randomized wellplates from reagents provided in eppendorf tubes.

You could store:
1. the eppendorf layout that you feed to the pipette robot as an ordinary `Set_of_Perturbation`.
@@ -584,7 +578,7 @@ For any system in the lab you can imagine, you can write a python3 acpipe librar


#### About **run sets**!
One runset represent one assay.
One runset represents one assay.
An assay combines all 3 acjson axis: Sample, Perturbation, and Endpoint.
The information can come from sampleset acjson files, perturbation set acjson files, endpoint acjson files, and superset acjson files.

@@ -599,12 +593,12 @@ The information can come from sampleset acjson files, perturbation set acjson fi
   or a warning when the acjson content differs.

#### About **date tracking**!
The tracking layer enables assay and superset related
date, protocol, and staff member metadata to be documented.
The tracking site links are located in the purple colored `App2Track` box.
How the side work should be quite self explanatory.
Currently there are two example date tracking sites implemented.
The tracking app can be customized for different experimental protocols.

At the moment, there are two date tracking sites implemented which simply figure as example.
One for Mema microenvironment micro array assay and one for Microarray spotter supersets.
For your own need, the tracking app have to be adjusted.
1. edit the `app2tacking/models.py` file to you needs
1. edit the `app2tacking/admin.py` file to you needs
1. enter annot by command line
@@ -655,25 +649,19 @@ Install [Docker Engine](https://docs.docker.com/).
[Docker Machine](https://docs.docker.com/machine/) and
[Docker Compose](https://docs.docker.com/compose/)
as described here: [Install Docker](https://docs.docker.com/install/),
How depends on the flavor of your operating system.

To be comprehensive, the complete docker platform has some more parts,
parts like *Docker Swarm* and *Docker Stack*,
but you will not have to know or install these parts to run annot.


### HowTo run the docker platform?
This howto will get you familiar with docker,
as much as is needed to run docker as annot user or developer.

To successfully run docker you have to know a whole set of docker commands,
the ones from the docker engine, the ones from docker-compose, and the ones from docker-machine.
The section below introduce you to a minimal set of commands needed to run annot.
from the docker engine, docker-compose, and docker-machine.
The section below introduces a minimal set of commands needed to run annot.
It is worthwhile to check out the list of all available docker engine, docker-compose, and
docker-machine commands. There are many nice commands that may be very helpful for your specific application.
docker-machine commands.
There are many nice commands that may be very helpful for your specific application.

The docker platform can be booted either by starting the docker engine
(How depends very much on your operating system) or by firing up a docker-machine.
The docker platform can be booted either by starting the docker engine or by firing up a docker-machine.
Annot as such, could run solely with the docker engine and docker-compose.
However, we have chosen to make use of docker-machine to allow one physical computer to run
more then one development version or a development and a deployed version simultaneously.
@@ -738,8 +726,8 @@ In the case of annot you usually you do not want do create a new container.

#### docker-compose commands

Web applications like annot are usually build out of many containers.
For example the development version of annot is build out of five containers:
Web applications like annot are usually built out of many containers.
For example the development version of annot is out of five containers:
annot_nginxdev_1, annot_webdev_1, annot_fsdata_1, annot_db_1, annot_dbdata_1.
To orchestrate the whole container set you can run docker-compose commands.
Nevertheless, it is important to know the low level docker engine commands,
+8 −8
Original line number Diff line number Diff line
@@ -15,7 +15,7 @@ where it is split into five docker containers:
1. annot_webdev_1 or annot_web_1 contains the actual annot code base.
1. annot_fsdata_1 contains all stored non-database data.

These five containers can be build and spin up together utilizing
These five containers can be built and spun up together utilizing
docker-compose command either with the dcdev.yml file for the development
version or the dcusr.yml file for the production version.

@@ -77,7 +77,7 @@ The postgresql engine related configuration settings are stored in the pgsql.env
This container is constructed out of the official postgresql docker image.
The psycopg2 library is listed in the requiremnet.txt in webdev and web folder.

Splitting database engine and data into two containers (annot_db_1 and annot_dbdata_1.
Splitting database engine and data into two containers (annot_db_1 and annot_dbdata_1
makes it really easy to update the database engine without loosing the data stored in the database.


@@ -86,7 +86,7 @@ makes it really easy to update the database engine without loosing the data stor

The webdev and web folder contain a:
1. Dockerfile with the container building instruction for annot_webdev_1 and annot_web_1.
  This containers are constructed out of the official [debian](https://www.debian.org/) based python3 docker image.
  These containers are constructed out of the official [debian](https://www.debian.org/) based python3 docker image.
1. requirement.txt file which lists the addition python libraries needed in the annot project.

The fsdata folder contains a
@@ -107,7 +107,7 @@ a bit familiar with this language.

### acJson - the assay coordinate json file format

Annot’s assay layouting backbone is the [acpipe_acjson](https://gitlab.com/biotransistor/acpipe_acjson) library.
Annot’s assay layout backbone is the [acpipe_acjson](https://gitlab.com/biotransistor/acpipe_acjson) library.
Acpipe_acjson is a python3 library to handle the acjson file format,
a file format developed to log complicated biological wet lab experiment layouts.
Acjson file format complies fully withe the [json](http://json.org/) standard.
@@ -117,7 +117,7 @@ Acjson file format complies fully withe the [json](http://json.org/) standard.

Annot is [django](https://www.djangoproject.com/) based web application.
Django as such is a python based web framework.
Annot makes especially use of the django,
Annot makes use of the django,
django-admin - which is leveraged as annot's [GUI](https://en.wikipedia.org/wiki/Graphical_user_interface) (graphical user interface),
and the external [django-selectable](https://django-selectable.readthedocs.io/en/latest/) library -
which provides searchable dropdown list boxes to the django-admin based GUI. Particularly
@@ -130,7 +130,7 @@ from the official django documentation.

### The folder structure inside the **annot/web/** folder

![An! django stack](img/annotv5stack20180418colour06.png)
![An! django stack](img/annotv5stack20180418colour07.png)

The annot/web folder contains the actual annot code base.
The django main project folder (prjannot) and all django app folders (app*) can be found here.
@@ -165,9 +165,9 @@ The man folder contains this very documentation.
Documentation is mainly written in [markdown](https://daringfireball.net/projects/markdown/syntax),
deployed via [read the docs](https://docs.readthedocs.io/en/latest/)
and generated using [sphinx](http://www.sphinx-doc.org/en/stable/).
Annot have to be under your PYTHONPATH, to be able to be processable by sphinx.
Annot must be under your PYTHONPATH, to be able to be processable by sphinx.

If you like to contribute by writing on the manual, please read at least
If you would like to contribute on the manual, please read at least
read the doc's [getting started](https://docs.readthedocs.io/en/latest/getting_started.html),
get familiar with the basic of [markdown](https://daringfireball.net/projects/markdown/syntax),
and check out Daniele Procida's ["what nobody tells you about documentation"](https://www.youtube.com/watch?v=t4vKPhjcMZg) talk.