Commit c1e27873 authored by Mitar's avatar Mitar

Updating documentation and removing mention of a baseline.

parent 7f885ac8
Pipeline #131625486 passed with stage
in 4 minutes and 4 seconds
......@@ -14,8 +14,8 @@ about GRPC.
## API Structure
TA3-TA2 API calls are defined in the *core* GRPC service which can be found in [`core.proto`](./core.proto) file
and all TA3 and TA2 sytems are expected to implement it and support it. Optional services can be
defined as well in other `.proto` files.
and all TA3 and TA2 systems are expected to implement it and support it. Other `.proto` files provide definitions
of additional standard messages.
Useful utilities for working with the TA3-TA2 API in Python are available in the included [ta3ta2_api](https://gitlab.com/datadrivendiscovery/ta3ta2-api/tree/dist-python) package.
......@@ -136,11 +136,6 @@ for partially specified pipelines; the problem description for a partially
specified pipeline should describe the data at the beginning of the pipeline,
not the end of the specified portion.
Examples of "relaxations" of the common requirements are included. We expect
that some TA2 systems will be able to work with those relaxed requirements,
and TA3s can use those if available, but it is not expected that every TA2
will. TA3s should be able to function within the restrictions as stated below.
### Pipeline templates
Pipeline templates are based on pipeline description with few differences:
......@@ -163,7 +158,8 @@ the purpose of TA3-TA2 API we are currently placing the following restrictions:
* The placeholder step has to have only one input, a Dataset container value, and one output,
predictions as a Pandas dataframe. In this way it resembles a standard pipeline.
* The placeholder can be only the last step in the pipeline.
* All primitive steps should have all their hyper-parameters fixed.
* All primitive steps should have all their hyper-parameters fixed (see also `use_default_values_for_free_hyperparams`
flag to control this requirement).
These restrictions effectively mean that a pipeline template can only specify a directed acyclic graph of preprocessing
primitives that transforms one or more input Dataset container values into a *single* transformed
......@@ -172,11 +168,13 @@ There are no additional restrictions on the types of individual primitives that
pipeline template, although impact on downstream TA2 processing should be assessed before a given
primitive is used.
Relaxation: Individual systems can relax those restrictions. For example, they might allow
a placeholder step to have postprocessing primitive steps after it. In this case postprocessing
Individual systems can relax those restrictions. For example, they might allow a placeholder step to
have postprocessing primitive steps after it. In this case postprocessing
primitives can only transform predictions from a placeholder step into transformed predictions.
Or individual systems might allow primitive steps to have free hyper-parameters a TA2 system
should tune.
should tune (see `use_default_values_for_free_hyperparams` flag to potentially control this behavior).
We expect that some TA2 systems will be able to work with those relaxed requirements, and
TA3s can use those if available, but it is not expected that every TA2 will.
### Fully specified pipelines
......@@ -371,10 +369,11 @@ In Go, accessing version is slightly more involved and it is described
## Extensions of messages
GRPC and Protocol Buffers support a simple method of extending messages: just define extra fields with custom tags
in your local version of the protocol. Performers can do that to experiment with variations of the protocol (and if
changes work out, they can submit a merge request to the common API). To make sure such unofficial fields in messages
do not conflict between performers, use values from the [allocated tag ranges](./private_tag_ranges.txt) for your
organization.
in your local version of the protocol. Users for this protocol can do that to experiment with variations of the protocol (and if
changes work out, they can submit a merge request for those changes to be included into this specification).
To make sure such unofficial fields in messages do not conflict between performers, use values from the
[allocated tag ranges](./private_tag_ranges.txt) for your organization, or add your organization via a
merge request.
## Changelog
......
......@@ -120,7 +120,6 @@ message SearchSolutionsRequest {
string version = 2;
// Desired upper limit of time for solution search, expressed in minutes.
// Default value of 0 (and any negative number) signifies no time bound.
// All TA2's should support and respect this request as part of minimum (baseline) performance.
// See also time_bound_run.
double time_bound_search = 3;
// Value stating the priority of the search. If multiple searches are queued then highest
......@@ -153,13 +152,12 @@ message SearchSolutionsRequest {
// inputs and any outputs. Otherwise pipelines have to be from a Dataset container value
// to predictions Pandas dataframe.
// While there are all these options possible, only a subset has to be supported by all
// systems. Minimum (baseline) requirements are that all TA2's support:
// systems:
// - Omitted templates,
// - Partial pipelines with one placeholder step, at the last step in the pipeline template,
// - Fully specified pipelines without free hyper-parameters, or with free hyper-parameters
// and "use_default_values_for_free_hyperparams" set to true.
// See the Pipeline section of the README for more details, in particular the discussion of
// common requirements and possible relaxations.
// See the Pipeline section of the README for more details.
PipelineDescription template = 7;
// Pipeline inputs used during solution search. They have to point to Dataset container
// values. Order matters as each input is mapped to a template's input in order. Optional
......@@ -169,7 +167,6 @@ message SearchSolutionsRequest {
// Expressed at time of search, TA2's should limit the search to solutions
// that would not take longer than this for one pass of the pipeline run.
// Default value of 0 (and any negative number) signifies no time bound.
// All TA2's should support and respect this request as part of minimum (baseline) performance.
// This can also functionally be used as the time bound for scoring also.
double time_bound_run = 9;
// Suggested maximum number of solutions to rank. Default is 0. If it is 0, it means
......@@ -227,7 +224,6 @@ message SearchSolutionsResponse {
// (as happens when the search is concluded on its own, or when a search is stopped
// by "StopSearchSolutions"). Found solution IDs during the search are no longer valid
// after this call.
// All TA2's should support and respect this request as part of minimum (baseline) performance.
message EndSearchSolutionsRequest {
string search_id = 1;
}
......@@ -246,7 +242,7 @@ message StopSearchSolutionsResponse {}
// Description of a TA2 score done during solution search. Because there is a wide range of
// potential approaches a TA2 can use to score candidate solutions this might not capture what
// your TA2 is doing. Feel free to request additions to be able to describe your approach.
// your TA2 is doing. Feel free to request additions to be able to describe your approach.
message SolutionSearchScore {
ScoringConfiguration scoring_configuration = 1;
repeated Score scores = 2;
......@@ -385,8 +381,8 @@ message FitSolutionRequest {
// If you want to expose outputs of the whole pipeline (e.g., predictions themselves),
// list them here as well. These can be recursive data references like
// "steps.1.steps.4.produce" to point to an output inside a sub-pipeline.
// Systems only have to support exposing final outputs and can return "ValueError" for
// intermediate values.
// Data references for step outputs which otherwise would not be computed during
// pipeline execution can also be provided to force them to be computed and exposed.
repeated string expose_outputs = 3;
// Which value types should be used for exposing outputs. If not provided, the allowed
// value types list from the search solutions call is used instead.
......@@ -433,8 +429,8 @@ message ProduceSolutionRequest {
// If you want to expose outputs of the whole pipeline (e.g., predictions themselves),
// list them here as well. These can be recursive data references like
// "steps.1.steps.4.produce" to point to an output inside a sub-pipeline.
// Systems only have to support exposing final outputs and can return "ValueError" for
// intermediate values.
// Data references for step outputs which otherwise would not be computed during
// pipeline execution can also be provided to force them to be computed and exposed.
repeated string expose_outputs = 3;
// Which value types should be used for exposing outputs. If not provided, the allowed
// value types list from the search solutions call is used instead.
......@@ -471,10 +467,11 @@ message GetProduceSolutionResultsResponse {
message SolutionExportRequest {
// Found solution to export.
string solution_id = 3;
// Solution rank to be used for the exported solution. Lower numbers represent
// better solutions. Presently evaluation requirements are that ranks should be non-negative
// and that each exported pipeline have a different rank. TA3 should make sure not to repeat ranks.
// Filenames of exported files are left to be chosen by the TA2 system.
// Solution rank to be used for the exported solution. Rank is a non-negative floating-point number.
// Lower numbers represent better solutions. Each exported solution should have
// a different rank. TA3 should make sure not to repeat ranks. Filenames of exported files are
// left to be chosen by the TA2 system, but all exported files should be inside a directory
// with a name equal to the search ID of the search the exported solution belongs to.
double rank = 2;
}
......@@ -488,7 +485,6 @@ message DataAvailableRequest {
string version = 2;
// Desired upper limit of time spend processing this data, expressed in minutes.
// Default value of 0 (and any negative number) signifies no time bound.
// All TA2's should support and respect this request as part of minimum (baseline) performance.
double time_bound = 3;
// Value stating the priority for processing of this data. Larger number is higher
// priority. If unspecified, by default priority is 0. Negative numbers have
......@@ -536,47 +532,47 @@ message ScorePredictionsResponse {
repeated Score scores = 1;
}
// Save a solution that can be loaded later. (Pipeline description/ Pipeline run) Optional.
// Save a solution that can be loaded later.
message SaveSolutionRequest {
// Id of solution to saved
// ID of a solution to save.
string solution_id = 1;
}
message SaveSolutionResponse {
// An URI pointing to a directory containing a solution saved.
// An URI pointing to a directory containing a saved solution.
string solution_uri = 1;
}
// Load a solution that was saved before.
message LoadSolutionRequest {
// An URI pointing to a directory containing a solution saved.
// An URI pointing to a directory containing a saved solution.
string solution_uri = 1;
}
message LoadSolutionResponse {
// Id of solution which was loaded.
// ID of a solution which was loaded.
string solution_id = 1;
}
// Save a fitted solution that can be loaded later. Optional.
// Save a fitted solution that can be loaded later.
message SaveFittedSolutionRequest {
// Fitted solution id to saved.
// ID of a fitted solution to save.
string fitted_solution_id = 1;
}
message SaveFittedSolutionResponse {
// An URI pointing to a directory containing a fitted solution.
// An URI pointing to a directory containing a saved fitted solution.
string fitted_solution_uri = 1;
}
// Load a fitted solution that was saved before.
message LoadFittedSolutionRequest {
// An URI pointing to a directory containing a fitted solution.
// An URI pointing to a directory containing a saved fitted solution.
string fitted_solution_uri = 1;
}
message LoadFittedSolutionResponse {
// Fitted solution id loaded.
// ID of a fitted solution which was loaded.
string fitted_solution_id = 1;
}
......@@ -649,7 +645,6 @@ service Core {
rpc Hello (HelloRequest) returns (HelloResponse) {}
// Optional.
rpc SaveSolution (SaveSolutionRequest) returns (SaveSolutionResponse) {}
rpc LoadSolution (LoadSolutionRequest) returns (LoadSolutionResponse) {}
......
......@@ -6,6 +6,10 @@ import "google/protobuf/timestamp.proto";
import "primitive.proto";
import "value.proto";
// Pipeline description corresponds to the JSON schema of a D3M pipeline
// description as available here:
// https://metadata.datadrivendiscovery.org/devel/?pipeline
//
// Pipeline description contains many "data references". Data reference is just a string
// which identifies an output of a step or a pipeline input and forms a data-flow connection
// between data available and an input to a step. It is recommended to be a string of the
......
......@@ -22,7 +22,7 @@ option go_package = "pipeline";
// value type which can be used without an error is used. If the list is
// exhausted, then an error is provided instead.
//
// The following value types are those everyone should support.
// The following value types are those all systems should support.
// * "RAW"
// Raw value. Not all values can be represented as a raw value.
// The value before encoding should be at most 64 KB.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment