Remove input_adapter to improve usability of prompt library
Problem to solve
There has been many instances where people (both from other teams and from our own team) complain about the config system is not easy to use. One of the main topic of complaint is input_adapter
. The name is confusing and it is difficult to know which adapter to use for a given input dataset.
As an example, the most recent manifestation of the problem can be found here: ai-experiments#17 (comment 1846259892)
Proposal
We can improve the user-friendliness by removing the need to specify an input adapter. Instead, we should automatically detect the input table schema and run the pipeline accordingly.
Specifically, we propose the following config format to specify the input data:
...
"input_source": {
"type": "bq",
"path": "dev-ai-research-0e2f8974.duo_chat_external.v1_chat_dataset_2"
},
...
for big query table, and
...
"input_source": {
"type": "rake",
"path": {
"resources": "<DATASET_REPO>/duo_chat/v1/jsonl/resources",
"completions": "<path to the output folder generated by the Rake task>"
}
},
...
for local rake task input.
There is no input_adapter
anymore.