Mismatched questions between the sample input and control dataset

Problem to solve

1. Mismatch between sample and control

The sample control datasets are dynamic view based on the daily runs table. The corresponding input dataset, however, are tables that are built using scheduled queries (once every 24 hours). This is to address a limitation with accessing view within the Apache Beam pipeline (see thread for more context).

The scheduled queries approach often results in mismatched questions between the sample input and control dataset. That is the questions in the input dataset are based on the previous daily run.

Proposed Solution

As suggested by @srayner, we can use query field when accessing the input dataset to circumvent the view restriction.

Update the ReadDatasetFromBigQuery to use query instead of table - !398 (merged).
Create views for sample input dataset.
Disable schedule queries.
Remove schedule queries.

2. Mismatch between sample dataset from today vs a week ago

The sample datasets are dynamic view based on the daily runs table. So the sample dataset will always be aligned with daily runs. This means the sample dataset will change from day to day.

However sometimes during experimentation and debugging, it is important to test against the exact same data. In this case, our constantly changing sample dataset is no good.

Proposed Solution

We can save a dataset to a local file, and then make the eval pipeline read from the local file. This way we can always have the option to run against the exact same data.

Update fetch-sample command to write jsonl data instead of csv. Csv is a very basic format that is difficult to handle: should we read header or not, what is the delimiter, can there be newline within a column etc. Json, on the other hand , is much more robust.
Add the ability for eval pipeline to read from local jsonl files.

Edited Apr 18, 2024 by Hongtao Yang