Commit 139c2ff1 authored by Mitar's avatar Mitar

Updated instructions.

parent ac07142f
Pipeline #95027991 passed with stage
in 5 minutes
......@@ -15,8 +15,8 @@ primitives/
<version>/
pipelines/
<pipeline 1 id>.json
<pipeline 1 id>.meta
<pipeline 2 id>.yml
<pipeline 1 id>_run.yaml
<pipeline 2 id>.yaml
...
primitive.json
failed/
......@@ -38,32 +38,21 @@ primitives/
* All added primitive annotations are regularly re-validated. If they fail validation,
they are moved under the `failed` directory.
* Pipeline examples in D3M pipeline description language must have a filename
matching pipeline's ID with `.json` or `.yml` file extensions.
* A pipeline can have a `.meta` file with same base filename. Existence of
this file makes the pipeline a standard pipeline (inputs are `Dataset` objects
matching pipeline's ID with `.json`, `.yml`, or `.yaml` file extensions.
* A pipeline can have a corresponding pipeline run file, based on same filename but with appended
`_run`. Existence of this file makes the pipeline a standard pipeline (inputs are `Dataset` objects
and output are predictions as `DataFrame`). Other pipelines might be
referenced as subpipelines with arbitrary inputs and outputs.
* `.meta` file is a JSON file providing a problem ID to be used with the pipeline
and input training and testing dataset IDs. Only
[standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
are allowed. In the case that a dataset does not have pre-split train/test/score
splits just provide `problem` and `full_inputs`. Note, MIT-LL "score" splits
have ID equal to the "test" split, change it to have the `SCORE` suffix.
```
{
"problem": "185_baseball_problem",
"full_inputs": ["185_baseball_dataset"],
"train_inputs": ["185_baseball_dataset_TRAIN"],
"test_inputs": ["185_baseball_dataset_TEST"],
"score_inputs": ["185_baseball_dataset_SCORE"]
}
```
* Pipeline run file demonstrates that the performer was able to run the pipeline, and also
provides configuration for anyone to re-run the pipeline. The pipeline run can reference
a problem and input datasets. Only [standard problems and datasets](https://gitlab.datadrivendiscovery.org/d3m/datasets)
are allowed. [Public ones are preferred](https://datasets.datadrivendiscovery.org/d3m/datasets).
* For primitive references in your pipelines, consider not specifying `digest`
field for primitives you do not control. This way your pipelnes will not
field for primitives you do not control. This way your pipelines will not
fail with digest mismatch if those primitives get updated. (But they might
fail because of behavior change of those primitives, but you cannot do much
about that.)
about that.) Pipeline run will contain precise digest information for versions
of primitives you used.
## Adding a primitive
......@@ -84,7 +73,12 @@ primitives/
performers, if not public, so that they can use them. **CI validation cannot check this**.
* Create any missing directories to adhere to the repository structure.
* Add pipeline examples for every primitive annotation you add.
* You can use `add.py` script available in this repository to help you with these two steps.
* Provide pipeline run files for every pipeline. Run your pipeline with reference runtime in
`fit-score` or `evaluate` modes and store the pipeline run file:
```
$ python3 -m d3m runtime -v /path/to/static/files fit-score -p <pipeline 2 id>.yaml -r .../problem_TRAIN/problemDoc.json -i .../dataset_TRAIN/datasetDoc.json -t .../dataset_TEST/datasetDoc.json -a .../dataset_SCORE/datasetDoc.json -O <pipeline 2 id>_run.yaml
```
* You can use `add.py` script available in this repository to help you with these two steps.
* Do not delete any existing files or modify files which are not your annotations.
* Once a merge request is made, the CI will validate added files automatically.
* After CI validation succeeds (`validate` job), the maintainers of the repository
......@@ -126,10 +120,10 @@ $ python3 -m d3m pipeline describe <path_to_JSON>
It will print out the pipeline JSON if it succeeds, or an error otherwise. You should probably run it inside
a Docker image with all primitives your pipeline references, or have them installed on your system.
You can validate your `.meta` file by running:
You can re-run your pipeline run file by running:
```bash
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -m your-pipeline.meta -p your-pipeline.yml
$ python3 -m d3m runtime -v /path/to/static/files -d /path/to/all/datasets fit-score -u <pipeline 2 id>_run.yaml -p <pipeline 2 id>.yaml
```
## Requesting a primitive
......
......@@ -32,7 +32,7 @@ import yaml
sys.stderr = sys.stdout
PRIMITIVE_ANNOTATION_REGEX = re.compile(r'^(?P<interface_version>v[^/]+)/(?P<performer_team>[^/]+)/(?P<python_path>[^/]+)/(?P<version>[^/]+)/primitive\.json$')
PIPELINE_REGEX = re.compile(r'^(?P<interface_version>v[^/]+)/(?P<performer_team>[^/]+)/(?P<python_path>[^/]+)/(?P<version>[^/]+)/pipelines/[^/.]+(\.yml|\.json|\.meta)$')
PIPELINE_REGEX = re.compile(r'^(?P<interface_version>v[^/]+)/(?P<performer_team>[^/]+)/(?P<python_path>[^/]+)/(?P<version>[^/]+)/pipelines/[^/.]+(\.yml|\.yaml|\.json|\.meta)$')
FIX_EGG_VALUE_REGEX = re.compile(r'Fix your #egg=\S+ fragments')
MAIN_REPOSITORY = 'https://gitlab.com/datadrivendiscovery/primitives.git'
......@@ -730,6 +730,10 @@ def validate_installation(primitive_names, interface_version, installation, anno
interface_version=interface_version, performer_team=annotation['source']['name'],
python_path=annotation['python_path'], version=annotation['version'],
))
pipeline_paths = glob.glob('{interface_version}/{performer_team}/{python_path}/{version}/pipelines/*.yaml'.format(
interface_version=interface_version, performer_team=annotation['source']['name'],
python_path=annotation['python_path'], version=annotation['version'],
))
pipeline_paths += glob.iglob('{interface_version}/{performer_team}/{python_path}/{version}/pipelines/*.json'.format(
interface_version=interface_version, performer_team=annotation['source']['name'],
python_path=annotation['python_path'], version=annotation['version'],
......@@ -741,7 +745,7 @@ def validate_installation(primitive_names, interface_version, installation, anno
pipeline_error = False
for pipeline_path in pipeline_paths:
with open(pipeline_path, 'r') as pipeline_file:
if pipeline_path.endswith('.yml'):
if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml'):
pipeline = yaml.safe_load(pipeline_file)
elif pipeline_path.endswith('.json'):
pipeline = json.load(pipeline_file)
......
......@@ -47,6 +47,12 @@ for interface_version in arguments.interface_versions:
primitives_used_in_pipelines.update(primitives_used_in_pipeline(pipeline))
for pipeline_path in glob.iglob('{interface_version}/*/*/*/pipelines/*.yaml'.format(interface_version=interface_version)):
with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
pipeline = yaml.safe_load(pipeline_file)
primitives_used_in_pipelines.update(primitives_used_in_pipeline(pipeline))
for primitive_id, primitive in known_primitives.items():
if primitive_id not in primitives_used_in_pipelines:
print(primitive['source']['name'], primitive['id'], primitive['python_path'])
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment