README.md 8.19 KB
Newer Older
Mitar's avatar
Mitar committed
1 2
# Index of open source D3M primitives

Mitar's avatar
Mitar committed
3
This repository contains JSON-serialized metadata (annotation) of open source primitives
Mitar's avatar
Mitar committed
4 5 6 7 8 9 10
and their example pipelines. You can use repository to discover available primitives.

## Structure of repository

The directory and file structure is defined and controlled:

```
Mitar's avatar
Mitar committed
11
primitives/
Mitar's avatar
Mitar committed
12 13 14 15 16 17
  <interface_version>/
    <performer_team>/
      <python_path>/
        <version>/
          pipelines/
            <pipeline 1 id>.json
Mitar's avatar
Mitar committed
18
            <pipeline 2 id>.yaml
Mitar's avatar
Mitar committed
19
            ...
Mitar's avatar
Mitar committed
20
          pipeline_runs/
Mitar's avatar
Mitar committed
21
            <some descriptive name>.yaml.gz
Mitar's avatar
Mitar committed
22
            ...
Mitar's avatar
Mitar committed
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
          primitive.json
  failed/
    ... structure as above ...
  archive/
    ... old primitive annotations ...
```

* `interface_version` is a version tag of a primitive interfaces package
  you used to generate the annotation and against which the primitive is
  developed, and should match the `primitive_code.interfaces_version` metadata entry
  with `v` prefix added.
* `performer_team` should match the `source.name` metadata entry.
* `python_path` should match the `python_path` metadata entry and should start
  with `d3m.primitives`.
* `version` should match `version` metadata entry.
* `primitive.json` is a JSON-serialized metadata of the primitive
39
  obtained by running `python3 -m d3m index describe -i 4 <python_path>`.
Mitar's avatar
Mitar committed
40 41 42
* All added primitive annotations are regularly re-validated. If they fail validation,
  they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename
Mitar's avatar
Mitar committed
43
  matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions.
Mitar's avatar
Mitar committed
44 45 46 47 48 49 50
  Put into the `pipelines` directory both main standard pipelines and any sub-pipeline they
  might need. Sub-pipelines can have arbitrary inputs and outputs.
* Pipeline runs can have an arbitrary file name (but keep it somewhat descriptive),
  with `.yml.gz` or `.yaml.gz` file extensions. This also means that they are gzip-compressed
  YAML files.
* Those pipeline runs should reference only pipelines in the corresponding `pipelines`
  directory.
Mitar's avatar
Mitar committed
51 52 53 54
* Pipeline run file demonstrates that the performer was able to run the pipeline, and also
  provides configuration for anyone to re-run the pipeline. The pipeline run can reference
  a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
  are allowed. [Public ones are preferred](https://datasets.datadrivendiscovery.org/d3m/datasets).
Mitar's avatar
Mitar committed
55
* For primitive references in your pipelines, consider not specifying `digest`
Mitar's avatar
Mitar committed
56
  field for primitives you do not control. This way your pipelines will not
Mitar's avatar
Mitar committed
57 58
  fail with digest mismatch if those primitives get updated. (But they might
  fail because of behavior change of those primitives, but you cannot do much
Mitar's avatar
Mitar committed
59 60
  about that.) Pipeline run will contain precise digest information for versions
  of primitives you used.
Mitar's avatar
Mitar committed
61 62 63 64 65 66 67 68 69 70 71 72 73

## Adding a primitive

* You can add a new primitive or a new version of a primitive by
  creating a merge request against the master branch of this repository
  with `primitive.json` file for the primitive added according to the
  repository structure.
  * To make a merge request make a dedicated branch for that merge request in
    the fork of this repository and make a merge request for it.
  * Do not work or modify anything in the `master` branch of your fork because you will
    have issues if your merge request will not get merged for some reason.
  * Keep sure that your fork, your `master` branch, and a dedicated branch you
    are working from are all up-to-date with the `master` branch of this repository.
74
* Generate `primitive.json` file using `python3 -m d3m index describe -i 4 <python_path> > primitive.json`
Mitar's avatar
Mitar committed
75 76 77 78 79
  command. The command will validate generated file automatically and
  will not generate JSON output if there are any issues.
  * Make sure all install dependencies are at least accessible to all other
    performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure.
Mitar's avatar
Mitar committed
80
  * You can use `add.py` script available in this repository to help you.
Mitar's avatar
Mitar committed
81
* Add pipeline examples for every primitive annotation you add.
Mitar's avatar
Mitar committed
82 83 84
* Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in
  `fit-score` or `evaluate` modes and store the pipeline run file:
    ```
Mitar's avatar
Mitar committed
85
    $ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 1 id>.json -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 1 id>_run.yaml
Mitar's avatar
Mitar committed
86
    ```
Mitar's avatar
Mitar committed
87 88 89 90
* Compress the pipeline run file: `gzip <pipeline 1 id>_run.yaml`.
  * If a compressed file is smaller than 100 KB, add it regularly to git, if it is larger, use git LFS for it.
  * Adding to git LFS can be done automatically by running `git-add.sh` which will add all new large files.
    After running the script you can commit to git as usual.
Mitar's avatar
Mitar committed
91 92
* Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically.
Mitar's avatar
Mitar committed
93 94
* After CI validation succeeds (`validate` job), the maintainers of the repository
  will merge the merge request.
Mitar's avatar
Mitar committed
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
* You can submit same primitive and version to multiple primitive interface
  directories if your primitive works well with them.
* There is also CI validation against current development version of core packages
  (`validate_devel` job). Failing that will output a warning but not prevent adding
  a primitive. This job validates your primitive annotation against devel version of
  core packages. In this way you can validate also against the upcoming version of
  core packages and make your annotation ready so that it can be automatically ported
  to the new release once it is published.

## Local validation

You can run the CI validation script also locally so that you do not have to commit
and push and wait for CI to test your primitive annotation. Passing the CI validation script
locally is not authoritative but it can help speed up debugging.

CI validation script has some requirements:

* Linux
* Working Docker installation, logged in into `registry.datadrivendiscovery.org`
* PyPi packages listed in [`requirements.txt`](./requirements.txt)
* Internet connection

Run it by providing the path to the primitive annotation file you want to validate. Example:

Mitar's avatar
Mitar committed
119
```bash
Mitar's avatar
Mitar committed
120 121 122 123 124
$ ./run_validation.py 'v2017.12.27/Test team/d3m.primitives.test.IncrementPrimitive/0.1.0/primitive.json'
```

To validate pipeline description do:

Mitar's avatar
Mitar committed
125
```bash
126
$ python3 -m d3m pipeline describe <path_to_JSON>
Mitar's avatar
Mitar committed
127 128 129 130 131
```

It will print out the pipeline JSON if it succeeds, or an error otherwise. You should probably run it inside
a Docker image with all primitives your pipeline references, or have them installed on your system.

Mitar's avatar
Mitar committed
132
You can re-run your pipeline run file by running:
Mitar's avatar
Mitar committed
133 134

```bash
Mitar's avatar
Mitar committed
135
$ python3 -m d3m -p /path/to/your/pipelines runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <some descriptive name>.yaml
Mitar's avatar
Mitar committed
136 137
```

Mitar's avatar
Mitar committed
138 139
## Requesting a primitive

Mitar's avatar
Mitar committed
140 141 142 143 144
If you would like to request a primitive, you can open an issue in this repository
and label it with `New primitive request` label.

If you are searching for ideas for primitives, see the
[list of all requests for primitives](https://gitlab.com/datadrivendiscovery/primitives/issues?label_name%5B%5D=New+primitive+request).
Mitar's avatar
Mitar committed
145 146 147

## Reporting issues with a primitive of a performer

Mitar's avatar
Mitar committed
148 149 150 151 152 153 154 155 156 157 158 159 160
Use this repository also to report all issues with a primitive of a performer.
In this way we can track issues with primitives program-wise.
Filing an issue against this repository will also allow other performers
to see known issues with primitives in one place.

When you open an issue, label it with [performer's label](https://gitlab.com/datadrivendiscovery/primitives/issues/labels). This will notify
that team about a new issue.
If you are able to find also the main repository for the primitive,
consider opening an issue first there and then just linking to it from
the issue in this repository.

Performers: Subscribe to your label so that you get notified when an issue
is labelled with your label.
Mitar's avatar
Mitar committed
161 162 163 164 165

## Note

Do not check in the source code here. Please host your source code in a different repository and
put the link or links in `source.uris` metadata entry.