README.md 7.7 KB
Newer Older
Mitar's avatar
Mitar committed
1 2
# Index of open source D3M primitives

Mitar's avatar
Mitar committed
3
This repository contains JSON-serialized metadata (annotation) of open source primitives
Mitar's avatar
Mitar committed
4 5 6 7 8 9 10
and their example pipelines. You can use repository to discover available primitives.

## Structure of repository

The directory and file structure is defined and controlled:

```
Mitar's avatar
Mitar committed
11
primitives/
Mitar's avatar
Mitar committed
12 13 14 15 16 17
  <interface_version>/
    <performer_team>/
      <python_path>/
        <version>/
          pipelines/
            <pipeline 1 id>.json
Mitar's avatar
Mitar committed
18 19
            <pipeline 1 id>_run.yaml
            <pipeline 2 id>.yaml
Mitar's avatar
Mitar committed
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
            ...
          primitive.json
  failed/
    ... structure as above ...
  archive/
    ... old primitive annotations ...
```

* `interface_version` is a version tag of a primitive interfaces package
  you used to generate the annotation and against which the primitive is
  developed, and should match the `primitive_code.interfaces_version` metadata entry
  with `v` prefix added.
* `performer_team` should match the `source.name` metadata entry.
* `python_path` should match the `python_path` metadata entry and should start
  with `d3m.primitives`.
* `version` should match `version` metadata entry.
* `primitive.json` is a JSON-serialized metadata of the primitive
37
  obtained by running `python3 -m d3m index describe -i 4 <python_path>`.
Mitar's avatar
Mitar committed
38 39 40
* All added primitive annotations are regularly re-validated. If they fail validation,
  they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename
Mitar's avatar
Mitar committed
41 42 43
  matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions.
* A pipeline can have a corresponding pipeline run file, based on same filename but with appended
  `_run`. Existence of this file makes the pipeline a standard pipeline (inputs are `Dataset` objects
Mitar's avatar
Mitar committed
44 45
  and output are predictions as `DataFrame`). Other pipelines might be
  referenced as subpipelines with arbitrary inputs and outputs.
Mitar's avatar
Mitar committed
46 47 48 49
* Pipeline run file demonstrates that the performer was able to run the pipeline, and also
  provides configuration for anyone to re-run the pipeline. The pipeline run can reference
  a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
  are allowed. [Public ones are preferred](https://datasets.datadrivendiscovery.org/d3m/datasets).
Mitar's avatar
Mitar committed
50
* For primitive references in your pipelines, consider not specifying `digest`
Mitar's avatar
Mitar committed
51
  field for primitives you do not control. This way your pipelines will not
Mitar's avatar
Mitar committed
52 53
  fail with digest mismatch if those primitives get updated. (But they might
  fail because of behavior change of those primitives, but you cannot do much
Mitar's avatar
Mitar committed
54 55
  about that.) Pipeline run will contain precise digest information for versions
  of primitives you used.
Mitar's avatar
Mitar committed
56 57 58 59 60 61 62 63 64 65 66 67 68

## Adding a primitive

* You can add a new primitive or a new version of a primitive by
  creating a merge request against the master branch of this repository
  with `primitive.json` file for the primitive added according to the
  repository structure.
  * To make a merge request make a dedicated branch for that merge request in
    the fork of this repository and make a merge request for it.
  * Do not work or modify anything in the `master` branch of your fork because you will
    have issues if your merge request will not get merged for some reason.
  * Keep sure that your fork, your `master` branch, and a dedicated branch you
    are working from are all up-to-date with the `master` branch of this repository.
69
* Generate `primitive.json` file using `python3 -m d3m index describe -i 4 <python_path> > primitive.json`
Mitar's avatar
Mitar committed
70 71 72 73 74 75
  command. The command will validate generated file automatically and
  will not generate JSON output if there are any issues.
  * Make sure all install dependencies are at least accessible to all other
    performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure.
* Add pipeline examples for every primitive annotation you add.
Mitar's avatar
Mitar committed
76 77 78 79 80 81
* Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in
  `fit-score` or `evaluate` modes and store the pipeline run file:
    ```
    $ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 2 id>.yaml -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 2 id>_run.yaml
    ```
* You can use `add.py` script available in this repository to help you with these two steps.
Mitar's avatar
Mitar committed
82 83
* Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically.
Mitar's avatar
Mitar committed
84 85
* After CI validation succeeds (`validate` job), the maintainers of the repository
  will merge the merge request.
Mitar's avatar
Mitar committed
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
* You can submit same primitive and version to multiple primitive interface
  directories if your primitive works well with them.
* There is also CI validation against current development version of core packages
  (`validate_devel` job). Failing that will output a warning but not prevent adding
  a primitive. This job validates your primitive annotation against devel version of
  core packages. In this way you can validate also against the upcoming version of
  core packages and make your annotation ready so that it can be automatically ported
  to the new release once it is published.

## Local validation

You can run the CI validation script also locally so that you do not have to commit
and push and wait for CI to test your primitive annotation. Passing the CI validation script
locally is not authoritative but it can help speed up debugging.

CI validation script has some requirements:

* Linux
* Working Docker installation, logged in into `registry.datadrivendiscovery.org`
* PyPi packages listed in [`requirements.txt`](./requirements.txt)
* Internet connection

Run it by providing the path to the primitive annotation file you want to validate. Example:

Mitar's avatar
Mitar committed
110
```bash
Mitar's avatar
Mitar committed
111 112 113 114 115
$ ./run_validation.py 'v2017.12.27/Test team/d3m.primitives.test.IncrementPrimitive/0.1.0/primitive.json'
```

To validate pipeline description do:

Mitar's avatar
Mitar committed
116
```bash
117
$ python3 -m d3m pipeline describe <path_to_JSON>
Mitar's avatar
Mitar committed
118 119 120 121 122
```

It will print out the pipeline JSON if it succeeds, or an error otherwise. You should probably run it inside
a Docker image with all primitives your pipeline references, or have them installed on your system.

Mitar's avatar
Mitar committed
123
You can re-run your pipeline run file by running:
Mitar's avatar
Mitar committed
124 125

```bash
Mitar's avatar
Mitar committed
126
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <pipeline 2 id>_run.yaml -p <pipeline 2 id>.yaml
Mitar's avatar
Mitar committed
127 128
```

Mitar's avatar
Mitar committed
129 130
## Requesting a primitive

Mitar's avatar
Mitar committed
131 132 133 134 135
If you would like to request a primitive, you can open an issue in this repository
and label it with `New primitive request` label.

If you are searching for ideas for primitives, see the
[list of all requests for primitives](https://gitlab.com/datadrivendiscovery/primitives/issues?label_name%5B%5D=New+primitive+request).
Mitar's avatar
Mitar committed
136 137 138

## Reporting issues with a primitive of a performer

Mitar's avatar
Mitar committed
139 140 141 142 143 144 145 146 147 148 149 150 151
Use this repository also to report all issues with a primitive of a performer.
In this way we can track issues with primitives program-wise.
Filing an issue against this repository will also allow other performers
to see known issues with primitives in one place.

When you open an issue, label it with [performer's label](https://gitlab.com/datadrivendiscovery/primitives/issues/labels). This will notify
that team about a new issue.
If you are able to find also the main repository for the primitive,
consider opening an issue first there and then just linking to it from
the issue in this repository.

Performers: Subscribe to your label so that you get notified when an issue
is labelled with your label.
Mitar's avatar
Mitar committed
152 153 154 155 156

## Note

Do not check in the source code here. Please host your source code in a different repository and
put the link or links in `source.uris` metadata entry.