The source project of this merge request has been removed.
WIP: DataMart primitives
Fixes https://gitlab.datadrivendiscovery.org/jpl/primitives_repo/issues/74
There's still work to do, but here is some initial primitives for the DataMart common API.
-
Search primitive: though search should happen out of the pipeline, you can still do this through the runtime if you want -
Download primitive: takes a search result, materialize it, and return a Dataset. This primitive also takes the input dataset (that was used in the search): this is for the future test protocol, when we will need to be able to find new datasets at test time if the test data is different. -
I also plan on providing a simple "inner join" primitive (using the code we currently have in the NYU DataMart). More join methods will probably be developed (by TA1 and DataMart performers), they don't have to be in common_primitives. -
Return correct metadata
-
-
Example pipeline ( pipelines/datamart.json) -
Example notebook ( test.ipynb)- This should be moved out of here before merging. Also
Dockerfile.
- This should be moved out of here before merging. Also
This uses the datamart client library that we developed. It talks the common API and therefore should work for both systems. It is a bit complex right now because of the support for local materialization (using datamart-materialize) but I plan on removing that (too complex, limited benefit if co-locating systems, incompatible with caching of datasets on the DataMart).
Remaining issues:
- Provide Union primitive? This can be done later.
- Figure out specific format of Search results and hyper-parameters for Download and Join
Edited by Remi Rampin