7.45 KB
Newer Older
Mitar's avatar
Mitar committed
1 2
# Index of open source D3M primitives

Mitar's avatar
Mitar committed
This repository contains JSON-serialized metadata (annotation) of open source primitives
Mitar's avatar
Mitar committed
4 5 6 7 8 9 10
and their example pipelines. You can use repository to discover available primitives.

## Structure of repository

The directory and file structure is defined and controlled:

Mitar's avatar
Mitar committed
Mitar's avatar
Mitar committed
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
            <pipeline 1 id>.json
            <pipeline 1 id>.meta
            <pipeline 2 id>.yml
    ... structure as above ...
    ... old primitive annotations ...

* `interface_version` is a version tag of a primitive interfaces package
  you used to generate the annotation and against which the primitive is
  developed, and should match the `primitive_code.interfaces_version` metadata entry
  with `v` prefix added.
* `performer_team` should match the `` metadata entry.
* `python_path` should match the `python_path` metadata entry and should start
  with `d3m.primitives`.
* `version` should match `version` metadata entry.
* `primitive.json` is a JSON-serialized metadata of the primitive
  obtained by running `python3 -m d3m index describe -i 4 <python_path>`.
Mitar's avatar
Mitar committed
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
* All added primitive annotations are regularly re-validated. If they fail validation,
  they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename
  matching pipeline's ID with `.json` or `.yml` file extensions.
* A pipeline can have a `.meta` file with same base filename. Existence of
  this file makes the pipeline a standard pipeline (inputs are `Dataset` objects
  and output are predictions as `DataFrame`). Other pipelines might be
  referenced as subpipelines with arbitrary inputs and outputs.
* `.meta` file is a JSON file providing a problem ID to be used with the pipeline
  and input training and testing dataset IDs. Only
  [standard problems and datasets](
  are allowed. In the case that a dataset does not have pre-split train/test/score
  splits just provide `problem` and `full_inputs`. Note, MIT-LL "score" splits
  have ID equal to the "test" split, change it to have the `SCORE` suffix.

        "problem": "185_baseball_problem",
        "full_inputs": ["185_baseball_dataset"],
        "train_inputs": ["185_baseball_dataset_TRAIN"],
        "test_inputs": ["185_baseball_dataset_TEST"],
        "score_inputs": ["185_baseball_dataset_SCORE"]
Mitar's avatar
Mitar committed
62 63 64 65 66
* For primitive references in your pipelines, consider not specifying `digest`
  field for primitives you do not control. This way your pipelnes will not
  fail with digest mismatch if those primitives get updated. (But they might
  fail because of behavior change of those primitives, but you cannot do much
  about that.)
Mitar's avatar
Mitar committed
67 68 69 70 71 72 73 74 75 76 77 78 79

## Adding a primitive

* You can add a new primitive or a new version of a primitive by
  creating a merge request against the master branch of this repository
  with `primitive.json` file for the primitive added according to the
  repository structure.
  * To make a merge request make a dedicated branch for that merge request in
    the fork of this repository and make a merge request for it.
  * Do not work or modify anything in the `master` branch of your fork because you will
    have issues if your merge request will not get merged for some reason.
  * Keep sure that your fork, your `master` branch, and a dedicated branch you
    are working from are all up-to-date with the `master` branch of this repository.
* Generate `primitive.json` file using `python3 -m d3m index describe -i 4 <python_path> > primitive.json`
Mitar's avatar
Mitar committed
81 82 83 84 85 86 87 88 89
  command. The command will validate generated file automatically and
  will not generate JSON output if there are any issues.
  * Make sure all install dependencies are at least accessible to all other
    performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure.
* Add pipeline examples for every primitive annotation you add.
  * You can use `` script available in this repository to help you with these two steps.
* Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically.
Mitar's avatar
Mitar committed
90 91
* After CI validation succeeds (`validate` job), the maintainers of the repository
  will merge the merge request.
Mitar's avatar
Mitar committed
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
* You can submit same primitive and version to multiple primitive interface
  directories if your primitive works well with them.
* There is also CI validation against current development version of core packages
  (`validate_devel` job). Failing that will output a warning but not prevent adding
  a primitive. This job validates your primitive annotation against devel version of
  core packages. In this way you can validate also against the upcoming version of
  core packages and make your annotation ready so that it can be automatically ported
  to the new release once it is published.

## Local validation

You can run the CI validation script also locally so that you do not have to commit
and push and wait for CI to test your primitive annotation. Passing the CI validation script
locally is not authoritative but it can help speed up debugging.

CI validation script has some requirements:

* Linux
* Working Docker installation, logged in into ``
* PyPi packages listed in [`requirements.txt`](./requirements.txt)
* Internet connection

Run it by providing the path to the primitive annotation file you want to validate. Example:

Mitar's avatar
Mitar committed
Mitar's avatar
Mitar committed
117 118 119 120 121
$ ./ 'v2017.12.27/Test team/d3m.primitives.test.IncrementPrimitive/0.1.0/primitive.json'

To validate pipeline description do:

Mitar's avatar
Mitar committed
$ python3 -m d3m pipeline describe <path_to_JSON>
Mitar's avatar
Mitar committed
124 125 126 127 128

It will print out the pipeline JSON if it succeeds, or an error otherwise. You should probably run it inside
a Docker image with all primitives your pipeline references, or have them installed on your system.

Mitar's avatar
Mitar committed
129 130 131
You can validate your `.meta` file by running:

Mitar's avatar
Mitar committed
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -m your-pipeline.meta -p your-pipeline.yml
Mitar's avatar
Mitar committed
133 134

Mitar's avatar
Mitar committed
135 136
## Requesting a primitive

Mitar's avatar
Mitar committed
137 138 139 140 141
If you would like to request a primitive, you can open an issue in this repository
and label it with `New primitive request` label.

If you are searching for ideas for primitives, see the
[list of all requests for primitives](
Mitar's avatar
Mitar committed
142 143 144

## Reporting issues with a primitive of a performer

Mitar's avatar
Mitar committed
145 146 147 148 149 150 151 152 153 154 155 156 157
Use this repository also to report all issues with a primitive of a performer.
In this way we can track issues with primitives program-wise.
Filing an issue against this repository will also allow other performers
to see known issues with primitives in one place.

When you open an issue, label it with [performer's label]( This will notify
that team about a new issue.
If you are able to find also the main repository for the primitive,
consider opening an issue first there and then just linking to it from
the issue in this repository.

Performers: Subscribe to your label so that you get notified when an issue
is labelled with your label.
Mitar's avatar
Mitar committed
158 159 160 161 162

## Note

Do not check in the source code here. Please host your source code in a different repository and
put the link or links in `source.uris` metadata entry.