Commit 25ee67cb authored by Mitar's avatar Mitar

Updating instructions.

parent c511f7c8
Pipeline #97795032 passed with stages
in 28 minutes and 51 seconds
...@@ -15,9 +15,11 @@ primitives/ ...@@ -15,9 +15,11 @@ primitives/
<version>/ <version>/
pipelines/ pipelines/
<pipeline 1 id>.json <pipeline 1 id>.json
<pipeline 1 id>_run.yaml
<pipeline 2 id>.yaml <pipeline 2 id>.yaml
... ...
pipeline_runs/
<some descriptive name>.yaml
...
primitive.json primitive.json
failed/ failed/
... structure as above ... ... structure as above ...
...@@ -39,10 +41,13 @@ primitives/ ...@@ -39,10 +41,13 @@ primitives/
they are moved under the `failed` directory. they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename * Pipeline examples in D3M pipeline description language must have a filename
matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions. matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions.
* A pipeline can have a corresponding pipeline run file, based on same filename but with appended Put into the `pipelines` directory both main standard pipelines and any sub-pipeline they
`_run`. Existence of this file makes the pipeline a standard pipeline (inputs are `Dataset` objects might need. Sub-pipelines can have arbitrary inputs and outputs.
and output are predictions as `DataFrame`). Other pipelines might be * Pipeline runs can have an arbitrary file name (but keep it somewhat descriptive),
referenced as subpipelines with arbitrary inputs and outputs. with `.yml.gz` or `.yaml.gz` file extensions. This also means that they are gzip-compressed
YAML files.
* Those pipeline runs should reference only pipelines in the corresponding `pipelines`
directory.
* Pipeline run file demonstrates that the performer was able to run the pipeline, and also * Pipeline run file demonstrates that the performer was able to run the pipeline, and also
provides configuration for anyone to re-run the pipeline. The pipeline run can reference provides configuration for anyone to re-run the pipeline. The pipeline run can reference
a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets) a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
...@@ -72,13 +77,17 @@ primitives/ ...@@ -72,13 +77,17 @@ primitives/
* Make sure all install dependencies are at least accessible to all other * Make sure all install dependencies are at least accessible to all other
performers, if not public, so that they can use them. **CI validation cannot check this**. performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure. * Create any missing directories to adhere to the repository structure.
* You can use `add.py` script available in this repository to help you.
* Add pipeline examples for every primitive annotation you add. * Add pipeline examples for every primitive annotation you add.
* Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in * Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in
`fit-score` or `evaluate` modes and store the pipeline run file: `fit-score` or `evaluate` modes and store the pipeline run file:
``` ```
$ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 1 id>.json -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 1 id>_run.yaml $ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 1 id>.json -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 1 id>_run.yaml
``` ```
* You can use `add.py` script available in this repository to help you with these two steps. * Compress the pipeline run file: `gzip <pipeline 1 id>_run.yaml`.
* If a compressed file is smaller than 100 KB, add it regularly to git, if it is larger, use git LFS for it.
* Adding to git LFS can be done automatically by running `git-add.sh` which will add all new large files.
After running the script you can commit to git as usual.
* Do not delete any existing files or modify files which are not your annotations. * Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically. * Once a merge request is made, the CI will validate added files automatically.
* After CI validation succeeds (`validate` job), the maintainers of the repository * After CI validation succeeds (`validate` job), the maintainers of the repository
...@@ -123,7 +132,7 @@ a Docker image with all primitives your pipeline references, or have them instal ...@@ -123,7 +132,7 @@ a Docker image with all primitives your pipeline references, or have them instal
You can re-run your pipeline run file by running: You can re-run your pipeline run file by running:
```bash ```bash
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <pipeline 1 id>_run.yaml -p <pipeline 1 id>.json $ python3 -m d3m -p /path/to/your/pipelines runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <some descriptive name>.yaml
``` ```
## Requesting a primitive ## Requesting a primitive
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment