Resolve "Create a Method to Load Stranded Data from GCS or Local After Pipeline Failure Due to Schema Irregularity"
requested to merge 188-create-a-method-to-load-stranded-data-from-gcs-or-local-after-pipeline-failure-due-to-schema into main
What does this merge request do and why?
The code-suggestions eval
pipeline takes a while to run and occasionally fails on the last step when there is a schema miss-match. Fortunately the data is stored by Apache Beam in a temporary GCS bucket and the data could be loaded from there if the proper schema was applied.
This solution must be generic enough to work with all pipelines!!! Simple ETL!
How to set up and validate locally
Request some test data from @srayner or run the bellow command to fetch the data:
gsutil cp gs://prompt-library/tmp/bq_load/379cf2a5ad1e4ea2b15f78a033d3c41b/dev-ai-research-0e2f8974.code_suggestion_external_results.output_full_v5-anthropic/c90bfae0-15ee-4e13-8847-5d78736a93c4 ./anthropic-run.jsonl
This is inline with the help message that you get when you run:
Closes #188 (closed)
Edited by Stephan Rayner