Commit 25ee67cb authored by Mitar's avatar Mitar

Updating instructions.

parent c511f7c8
Pipeline #97795032 passed with stages
in 28 minutes and 51 seconds
......@@ -15,9 +15,11 @@ primitives/
<version>/
pipelines/
<pipeline 1 id>.json
<pipeline 1 id>_run.yaml
<pipeline 2 id>.yaml
...
pipeline_runs/
<some descriptive name>.yaml
...
primitive.json
failed/
... structure as above ...
......@@ -39,10 +41,13 @@ primitives/
they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename
matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions.
* A pipeline can have a corresponding pipeline run file, based on same filename but with appended
`_run`. Existence of this file makes the pipeline a standard pipeline (inputs are `Dataset` objects
and output are predictions as `DataFrame`). Other pipelines might be
referenced as subpipelines with arbitrary inputs and outputs.
Put into the `pipelines` directory both main standard pipelines and any sub-pipeline they
might need. Sub-pipelines can have arbitrary inputs and outputs.
* Pipeline runs can have an arbitrary file name (but keep it somewhat descriptive),
with `.yml.gz` or `.yaml.gz` file extensions. This also means that they are gzip-compressed
YAML files.
* Those pipeline runs should reference only pipelines in the corresponding `pipelines`
directory.
* Pipeline run file demonstrates that the performer was able to run the pipeline, and also
provides configuration for anyone to re-run the pipeline. The pipeline run can reference
a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
......@@ -72,13 +77,17 @@ primitives/
* Make sure all install dependencies are at least accessible to all other
performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure.
* You can use `add.py` script available in this repository to help you.
* Add pipeline examples for every primitive annotation you add.
* Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in
`fit-score` or `evaluate` modes and store the pipeline run file:
```
$ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 1 id>.json -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 1 id>_run.yaml
```
* You can use `add.py` script available in this repository to help you with these two steps.
* Compress the pipeline run file: `gzip <pipeline 1 id>_run.yaml`.
* If a compressed file is smaller than 100 KB, add it regularly to git, if it is larger, use git LFS for it.
* Adding to git LFS can be done automatically by running `git-add.sh` which will add all new large files.
After running the script you can commit to git as usual.
* Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically.
* After CI validation succeeds (`validate` job), the maintainers of the repository
......@@ -123,7 +132,7 @@ a Docker image with all primitives your pipeline references, or have them instal
You can re-run your pipeline run file by running:
```bash
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <pipeline 1 id>_run.yaml -p <pipeline 1 id>.json
$ python3 -m d3m -p /path/to/your/pipelines runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <some descriptive name>.yaml
```
## Requesting a primitive
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment