Commit 2a6258c7 authored by Stefan Beck's avatar Stefan Beck
Browse files

Reorganize and improve documentation

parent 9ff0c24d
......@@ -30,7 +30,7 @@ The same holds for other backend services, you can check the very last lines of
[provisioning/provision.sh](https://github.com/dissemin/dissemin/blob/master/provisioning/provision.sh)
to find out how to start them.
See the [full installation instructions](https://dissemin.readthedocs.io/en/latest/install.html), which include alternative methods.
See the [full installation instructions](https://dissemin.readthedocs.io/en/latest/installation/index.html), which include alternative methods.
## Contributing
......@@ -50,16 +50,12 @@ incomplete.
Translations are hosted at [TranslateWiki](https://translatewiki.net/wiki/Translating:Dissemin) for an easy-to-use interface for translations and statistics.
We are always looking for translators for all languages.
Dissem.in uses the [Django's standard localization system](https://docs.djangoproject.com/en/2.2/topics/i18n/), which makes it easy to contribute new translations.
See the [full details on how to contribute translations](https://dissemin.readthedocs.io/en/latest/localization.html).
### Writing an interface for a new repository
Writing an interface for a new repository, so that Dissem.in could upload to this repository, is very easy!
A [full tutorial](https://dissemin.readthedocs.io/en/latest/writing_new_repository_interface.html) is available.
A [full tutorial](https://dissemin.readthedocs.io/en/latest/contributing/writing_new_repository_interface.html) is available.
## Links
......
.. _page-administrating_tips:
Tips for Dissemin instance administrators
=========================================
Here are some tips for administrators of a Dissemin instance.
Check the current status of CrossRef API harvesting
---------------------------------------------------
The date and time of the latest paper harvested from the CrossRef API can be
seen running the command-line ``python manage.py crossref_last_update``.
==========================
Administration of Dissemin
==========================
.. toctree::
:maxdepth: 2
repositories
......@@ -43,11 +43,12 @@ Parent can be any of the class in ``100*i for i in range(10)``.
This groups the DDC when displayed to the user.
.. note::
You can localize your DDC name, see :doc:`localization` for further information.
You can localize your DDC name, see :doc:`../contributing/localization` for further information.
Green Open Access Service
=========================
Under ``goa_service`` you can add some information about a possible green open access service. Leave this empty if you do not want a message shown to the user.
The GOAS object requires
......@@ -57,7 +58,7 @@ The GOAS object requires
* ``learn_more_url`` - URL to the webpage with more information about this service
.. note::
You can localize your DDC name, see :doc:`localization` for further information.
You can localize your DDC name, see :doc:`../contributing/localization` for further information.
Licenses
......@@ -68,7 +69,7 @@ On the admin site in the section ``Deposit`` you find your licenses. You can add
Each license consists of its name and its URI. If your license does not provide a URI, you can use the namespace ``https://dissem.in/deposit/license/``.
.. note::
You can localize your licenses name, see :doc:`localization` for further information.
You can localize your licenses name, see :doc:`../contributing/localization` for further information.
Creating a Letter of Declaration
================================
......
.. _page-api:
Dissemin API
============
===
API
===
Dissemin provides an API to query the availability of arbitrary papers.
Please do not assume the interface will not change in the future as it
is still being improved.
Querying the API
================
Querying by DOI
---------------
You can retrieve Dissemin's metadata for a specific paper by DOI:
https://dissem.in/api/10.1016/j.paid.2009.02.013.
`<https://dissem.in/api/10.1016/j.paid.2009.02.013>`_
Querying by Dissemin paper ID
Querying by Dissemin Paper ID
-----------------------------
Dissemin stores internal numeric identifiers for its papers. These identifiers are exposed
in the URLs of the paper pages, for instance. It is possible to retrieve metadata from these
identifiers:
Dissemin stores internal numeric identifiers for its papers.
These identifiers are exposed in the URLs of the paper pages, for instance.
It is possible to retrieve metadata from these identifiers:
https://dissem.in/api/p/6859902
`<https://dissem.in/api/p/6859902`>_
Querying by metadata fields
Querying by Metadata Fields
---------------------------
When the DOI or the Dissemin ID are not known, it is possible to retrieve a paper by title,
authors and publication date. This is done by posting a JSON object encoding this metadata
to https://dissem.in/api/query, as follows::
When the DOI or the Dissemin ID are not known, it is possible to retrieve a paper by title, authors and publication date.
This is done by posting a JSON object encoding this metadata to https://dissem.in/api/query, as follows::
curl -H "Content-Type: application/json" -d '{"title":"Refining the Conceptualization of an Important Future-Oriented Self-Regulatory Behavior: Proactive Coping", "date":"2009-07-01","authors":[{"first":"Stephanie Jean","last":"Sohl"},{"first":"Anne","last":"Moyer"}]}' https://dissem.in/api/query
The date field can contain coarser dates such as ``2009-07`` or ``2009``, and authors can also be specified
as plain text with ``{"plain":"Anne Moyer"}`` instead of ``{"first":"Anne","last":"Moyer"}``.
This API method uses the internal paper deduplication strategy in Dissemin to match the bibliographic
reference to a known paper in the database. This deduplication is done by computing a unique key (called fingerprint)
from the title, authors and publication date. Therefore, this API method will always return at most one paper,
unlike the search endpoint below which works like traditional search engines.
This API method uses the internal paper deduplication strategy in Dissemin to match the bibliographic reference to a known paper in the database.
This deduplication is done by computing a unique key (called fingerprint) from the title, authors and publication date.
Therefore, this API method will always return at most one paper, unlike the search endpoint below which works like traditional search engines.
Searching the API
=================
The search interface is also exposed as an API.
The parameters it understands are the same as the human-readable version at https://dissem.in/search.
Statistics about the results are also returned.
There are the following search keys:
authors
List of authors, separated by ``,``. To enforce a last name, prefix with ``last:``.
doctypes
Filter by document types. There are the following document types available: ``book``, ``book-chapter``, ``dataset``, ``journal-article``, ``journal-issue``, ``other``, ``poster``, ``preprint``, ``proceedings``, ``proceedings-article``, ``reference-entry``, ``report``, ``thesis``
pub_after
Published after given date. The format is ``YYYY``, ``YYYY-MM``, ``YYYY-MM-DD``.
pub_before
Published before given date. The format is ``YYYY``, ``YYYY-MM``, ``YYYY-MM-DD``.
q
Search for title
sort_by
The results are sorted descending by date, i.e. newest first. To revserse the order, pass ``pubdate``.
status
The open access status as computed by Dissemin. This can be one of
Searching for papers
--------------------
oa
Available from the publisher
ok
Available from the author
couldbe
Could be shared by the authors
unk
Unknown/unclear sharing policy
closed
Publisher forbids sharing
The search interface is also exposed as an API. The parameters it
understands are the same as the human-readable version at
https://dissem.in/search. Statistics about
the results are also returned.
You can pass multiple ``status``.
https://dissem.in/api/search/?q=pregroup
Understanding the results
-------------------------
Understanding the Results
=========================
::
......@@ -65,8 +92,7 @@ Understanding the results
"pdf_url": "http://www.ncbi.nlm.nih.gov/pubmed/19578529",
"records": [
{
"splash_url":
"https://doi.org/10.1016/j.paid.2009.02.013",
"splash_url": "https://doi.org/10.1016/j.paid.2009.02.013",
"doi": "10.1016/j.paid.2009.02.013",
"publisher": "Elsevier BV",
"issue": "2",
......@@ -80,14 +106,12 @@ Understanding the results
"postprint": "can",
"published": "cannot"
},
"identifier":
"oai:crossref.org:10.1016/j.paid.2009.02.013",
"identifier": "oai:crossref.org:10.1016/j.paid.2009.02.013",
"type": "journal-article",
"pages": "139-144"
},
{
"splash_url":
"https://www.researchgate.net/publication/26648440_Refining_the_Conceptualization_of_an_Important_Future-Oriented_Self-Regulatory_Behavior_Proactive_Coping",
"splash_url": "https://www.researchgate.net/publication/26648440_Refining_the_Conceptualization_of_an_Important_Future-Oriented_Self-Regulatory_Behavior_Proactive_Coping",
"doi": "10.1016/j.paid.2009.02.013",
"contributors": "",
"abstract": "Proactive coping, directed at an upcoming as
......@@ -111,16 +135,14 @@ Understanding the results
positive future is distinctly predictive of well-being and that research
should focus on accumulating resources and goal setting in designing
interventions to promote proactive coping.",
"pdf_url":
"https://www.researchgate.net/profile/Stephanie_Sohl2/publication/26648440_Refining_the_Conceptualization_of_an_Important_Future-Oriented_Self-Regulatory_Behavior_Proactive_Coping/links/55e463c008ae2fac47227a76.pdf",
"pdf_url": "https://www.researchgate.net/profile/Stephanie_Sohl2/publication/26648440_Refining_the_Conceptualization_of_an_Important_Future-Oriented_Self-Regulatory_Behavior_Proactive_Coping/links/55e463c008ae2fac47227a76.pdf",
"source": "researchgate",
"keywords": "",
"identifier": "oai:researchgate.net:26648440",
"type": "journal-article"
},
{
"splash_url":
"http://www.ncbi.nlm.nih.gov/pubmed/19578529",
"splash_url": "http://www.ncbi.nlm.nih.gov/pubmed/19578529",
"doi": "10.1016/j.paid.2009.02.013",
"contributors": "",
"abstract": "Proactive coping, directed at an upcoming as
......@@ -147,8 +169,7 @@ Understanding the results
"pdf_url": "http://www.ncbi.nlm.nih.gov/pubmed/19578529",
"source": "base",
"keywords": "Article",
"identifier":
"ftpubmed:oai:pubmedcentral.nih.gov:2705166",
"identifier": "ftpubmed:oai:pubmedcentral.nih.gov:2705166",
"type": "other"
}
],
......@@ -183,7 +204,7 @@ other ones:
without being necessarily available. This can be a publisher webpage (with
the article available behind a paywall), a page about the paper without a
copy of the full text (e.g., a HAL page like
https://hal.archives-ouvertes.fr/hal-01664049), or a page from which the
`<https://hal.archives-ouvertes.fr/hal-01664049>`_), or a page from which the
paper was discovered (e.g., the profile of a user on ORCID).
- **pdf\_url** is a URL where Dissemin thinks the full text can be
accessed for free. This is rarely a direct link to an actual PDF
......@@ -196,16 +217,17 @@ other ones:
indicates our assessment of the availability of that record. If the
publisher has been found in RoMEO, it also indicates the summary of
its policy, using the codes drawn from `the RoMEO
API <http://www.sherpa.ac.uk/romeo/api.html>`__. This list will
API <http://www.sherpa.ac.uk/romeo/api.html>`_. This list will
remain empty if no DOI is provided.
License, usage
--------------
License, Usage
==============
CAPSH claims no ownership of the metadata served via this API. It has
been collected from various free sources.
CAPSH claims no ownership of the metadata served via this API.
It has been collected from various free sources.
The interface itself should not be abused: please do not use concurrent
connections on it, and keep your requests to a slow rate (at most one
per second). If you need a faster access to this data, please get in
per second).
If you need a faster access to this data, please get in
touch with us.
.. _page-apikeys:
Getting API keys
================
Dissemin relies on various interfaces to fetch its metadata.
Some of them require to register for an API key, that dissemin
reads in ``dissemin/settings.py``, the main configuration file.
Here is how to register for these interfaces.
SHERPA/RoMEO
------------
`SHERPA/RoMEO <http://www.sherpa.ac.uk/romeo>`_ gives a machine-readable to publishers' self-archiving
policies.
The API key is not required but encoraged as unauthenticated users
can perform a limited number of queries daily.
To get an API key, visit `this page <http://www.sherpa.ac.uk/romeo/apiregistry.php>`_.
The key should then be written in ``dissemin/settings/secrets.py``, as ``ROMEO_API_KEY``.
Zenodo
------
`Zenodo <https://zenodo.org>`_ is a repository hosted by CERN, storing publications as well as
research data. Dissemin uses it to upload papers on behalf of users.
To use Zenodo, you need `an account <https://zenodo.org/youraccount/register>`_. You can
then generate an auth token from their web interface.
Then, set up the repository via the Dissemin admin interface (available at /admin).
Proaixy
-------
Proaixy is an OAI-PMH proxy where disemin discovers preprints.
For now, no API key is required to use this service.
......@@ -20,6 +20,9 @@ import sys
import mock
import django
from datetime import date
from django.db.models.fields.files import FileDescriptor
from django.utils.html import strip_tags
......@@ -57,17 +60,17 @@ source_suffix = '.rst'
master_doc = 'index'
# General information about the project.
project = u'Dissemin'
copyright = u'2017, CAPSH'
project = 'Dissemin'
copyright = '{}, CAPSH'.format(date.today().year)
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.1'
# version = '0.1'
# The full version, including alpha/beta/rc tags.
release = '0.1'
# release = '0.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......@@ -112,12 +115,14 @@ pygments_style = 'sphinx'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
html_theme_options = {
'style_external_links' : True,
}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
......
=============
Configuration
=============
Configure the Application for Development or Production
=======================================================
Finally, create a file ``dissemin/settings/__init__.py`` with this content::
# Development settings
from .dev import *
# Production settings.
from .prod import *
# Pick only one.
For most of the settings we refer to the `Django documentation <https://docs.djangoproject.com/en/2.2/topics/settings/>`_.
Logs
====
Dissemin comes with a predefined log system. You can change the settings in ``dissemin/settings/common.py`` and change the default log level for production and development in the corresponding files. When using Dissemin from the shell with ``./manage shell`` you can set the log level for console output as environment variable with::
export DISSEMIN_LOGLEVEL='YOUR_LOG_LEVEL'
When using in production make sure that apache collects all your log message.
Alternatively you can send them to a separate file by changing log settings.
Sentry
------
Dissemin uses `Sentry <https://sentry.io/welcome/>`_ to monitor severe errors.
To enable Sentry, set the ``SENTRY_DSN``.
ORCID
=====
You can either use production ORCID or its sandbox.
The main difference is the registration process.
*You are not forced to configure ORCID to work on Dissemin, just create a super user and use it!*
.. _configure_orcid_production:
Production
----------
On your ORCID account got to *Developer Tools* and register an API key.
As a redirection URL you give the URL to your installation.
Set ``ORCID_BASE_DOMAIN`` to ``orcid.org`` in the Dissemin settings.
On the admin surface got to *Social Authentication*, set the provider to ``orcid.org`` and enter the required data.
Now you can authenticate with ORCID.
Sandbox
-------
Create an account on `Sandbox ORCID <https://sandbox.orcid.org>`_.
Go to *Developer Tools*, verify your mail using `Mailinator <mailinator.com>`. You must not choose a different provider.
Set up a redirection URI to be `localhost:8080` (supposed to be where your Dissemin instance server is running).
Now proceed as in :ref:`configure_orcid_production`, but with ``sandbox.orcid.org``.
.. _page-docs:
=============
Documentation
=============
Building the docs
=================
This documentation is generated by sphinx and hosted on Readthedocs, so it will be always up to date.
This documentation is generated by sphinx, both from the source code
and with some additional documentation files. To build it, you need a working
dissemin install where you have installed the packages in
``requirements-dev.txt``.
There are two steps to generate the docs: first, auto-generate the
reStructuredText sources of the docs with sphinx-apidoc::
# first, make sure you are in an environment where
# the requirements are available
source .virtualenv/bin/activate
# then, invoke sphinx-apidoc via make
make -B doc
Building local documentation
============================
In case you need a local documentation, e.g. to check formatting, activate your virtual environment, make sure to have installed ``requirements-dev.txt``.
Then, compile these RST sources to HTML::
......@@ -24,8 +15,10 @@ Then, compile these RST sources to HTML::
The HTML output is then available in ``doc/sphinx/_build/html/``.
The theme is the same as from Readthedocs.
Generating model diagrams
-------------------------
=========================
The UML diagrams for the models are generated using the ``django-extensions`` library.
When making changes to the models, these diagrams should be updated. They are
......
.. _page-contributing_faq:
==========
FAQ & Tips
==========
FAQ for contributing to Dissemin
================================
Here are some frequently asked questions and tips for getting started to work and contribute to Dissemin. The best idea to start hacking on Dissemin is probably to use the VM (Vagrant method from :ref:`page-install`).
Here are some frequently asked questions and tips for getting started to work and contribute to Dissemin.
The best idea to start hacking on Dissemin is probably to use the VM (Vagrant method from :ref:`installation_vagrant`).
Fetching a specific paper by DOI
--------------------------------
......
.. _page-ide:
Setting up Dissemin for development in an IDE
Setting up Dissemin for Development in an IDE
==============================================
This page lists some possible ways to set up Dissemin locally for development, including setting up an IDE to edit Dissemin conveniently.
First, you need to install Dissemin locally: see :ref:`page-install` for that. In particular, you will need to have postgres, redis and elasticsearch instanced running during development, as these services are required to run the tests.
First, you need to install Dissemin locally: see :ref:`page-installation` for that.
In particular, you will need to have postgres, redis and elasticsearch instanced running during development, as these services are required to run the tests.
Eclipse and PyDev
-----------------
......
============
Contributing
============
This section explains how to do some development tasks on the source code or helping with translations.
.. toctree::
:maxdepth: 2
ide
localization
translation
tests
documentation
writing_new_repository_interface
faq
.. _page-localization:
============
Localization
============
Translations are hosted at `TranslateWiki
<https://translatewiki.net/wiki/Translating:Dissemin>`_, for an easy-to-use
interface for translations and statistics.
We use `Django's standard localization system <https://docs.djangoproject.com/en/2.2/topics/i18n/>`_, based on i18n.
This lets us translate strings in various places:
This lets us translate strings in various places.
Localization in Files
=====================
Most localizations are in files:
* in Python code, use ``_("some translatable text")``, where ``_`` is imported by ``from django.utils.translation import ugettext_lazy as _``
* in Javascript code, use ``gettext("some translatable text")``
......@@ -32,12 +31,17 @@ Currently the following models use translations:
* License (field ``name``) in ``deposit.models``
* Repository (fields ``name, description``) in ``deposit.models``
For localization we use `django-vinaigrette <https://pypi.org/project/django-vinaigrette/>`_. Please read their documentation for further information. In short: You have to keep in mind that:
For localization we use `django-vinaigrette <https://pypi.org/project/django-vinaigrette/>`_.
Please read their documentation for further information.
In short: You have to keep in mind that:
* in admin interface you do not see the localized strings,
* you should add only English in the admin interface,
* you will have to recreate the ``*.po`` files and add the translation manually (see below),
* your local translations do not interact with TranslateWiki.
* your *local* translations do not interact with TranslateWiki.
From our production environment we have extracted strings from above models.
They are stored in ``model-gettext.py`` so that we have them available for TranslateWiki.
Generating PO files
===================
......@@ -49,6 +53,10 @@ The important thing when generating PO files locally is to preserve the strings
Vinaigrette saves the strings to be translated in a file called ``vinaigrette-deleteme.py``.
Usually this files is going to be deleted, but we keep it as it carries our translation strings from the models.
Since we use TranslateWiki, we please do not generate any ``.po`` files, as there is a high chance of a merge conflict. Just state that your PR uses localizations, then the Dissemin team will generate to ``.po`` files.
Unless you need localization in your development environment, you can ignore the following sections.
Generate locally
----------------
......@@ -92,4 +100,3 @@ Available Languages
===================
You can change the set of available languages for your installation in ``dissemin/settings/common.py`` by changing the ``LANGUAGES`` list, e.g. by commenting or uncommenting the corresponding lines.
.. _page-docs:
Testing dissemin
================
=====
Tests
=====
Dissemin's test suite is run using ``pytest`` rather than using Django's ``./manage.py test``.
Pytest offers many additional features compared to Python's standard ``unittest`` which
is used in Django. To run the test suite, you need to install pytest and other packages,
mentioned in ``requirements-dev.txt``.
Pytest offers many additional features compared to Python's standard ``unittest`` which is used in Django.
To run the test suite, you need to install pytest and other packages, mentioned in ``requirements-dev.txt``.
The test suite is configured in ``pytest.ini``, which determines which files are scanned
for tests, and where Django's settings are located.
The test suite is configured in ``pytest.ini``, which determines which files are scanned for tests, and where Django's settings are located.
Some tests rely on remote services. Some of them require API keys, they will fetch them
from the following environment variables (or be skipped if these environment variables are
not defined):
Some tests rely on remote services.
Some of them require API keys, they will fetch them from the following environment variables (or be skipped if these environment variables are not defined):
* ``ROMEO_API_KEY``
* ``ZENODO_SANDBOX_API_KEY`` required for tests of the Zenodo interface.
This can be obtained by creating an account on sandbox.zenodo.org and creating a "Personal Access Token" from there.
Fixtures
--------
========
Dissemin comes with some fixtures predefined. There are mainly two types:
Dissemin comes with some fixtures predefined.
There are mainly two types:
1. Fixtures coming from ``load_test_data``
2. Pure python fixtures in ``conftest.py``
While the first class of fixtures laods a lot of data into the test database, they are not always suitable and little obscur. We encourage you not to use them except it is necessary.
While the first class of fixtures laods a lot of data into the test database, they are not always suitable and little obscur.
We encourage you not to use them except it is necessary.
The second class is not yet completed. You find some fixtures in the projects root. You can add more fixtures as you need them. If your fixture is only suitable or interesting for a single app, please use it's ``conftest.py``.
The second class is not yet completed.
You find some fixtures in the projects root.
You can add more fixtures as you need them.
If your fixture is only suitable or interesting for a single app, please use it's ``conftest.py``.
So, for example, if you need more repositories or with special properties, add the corresponding function into the ``Dummy`` class of the fixture ``repository``. If you want to use this new repository often out of the box, add a new fixture, that gets it from the ``Dummy`` class as shown with the ``dummy_repository`` fixture.
So, for example, if you need more repositories or with special properties, add the corresponding function into the ``Dummy`` class of the fixture ``repository``.
If you want to use this new repository often out of the box, add a new fixture, that gets it from the ``Dummy`` class as shown with the ``dummy_repository`` fixture.
The benefit of the second approach is more control and better extensibility.
We also provide some fixtures in JSON in our folder ``test_data``.