Remove input_adapter to improve usability of prompt library

Problem to solve

There has been many instances where people (both from other teams and from our own team) complain about the config system is not easy to use. One of the main topic of complaint is input_adapter. The name is confusing and it is difficult to know which adapter to use for a given input dataset.

As an example, the most recent manifestation of the problem can be found here: ai-experiments#17 (comment 1846259892)

Proposal

We can improve the user-friendliness by removing the need to specify an input adapter. Instead, we should automatically detect the input table schema and run the pipeline accordingly.

Specifically, we propose the following config format to specify the input data:

  ...
  "input_source": {
    "type": "bq",
    "path": "dev-ai-research-0e2f8974.duo_chat_external.v1_chat_dataset_2"
  },
  ...

for big query table, and

  ...
  "input_source": {
    "type": "rake",
    "path": {
      "resources": "<DATASET_REPO>/duo_chat/v1/jsonl/resources",
      "completions": "<path to the output folder generated by the Rake task>"
    }
  },
  ...

for local rake task input.

There is no input_adapter anymore.

Edited Apr 05, 2024 by Hongtao Yang