Replace pipeline_started_at with created_at
What does this merge request do and why?
In a previous MR, a new field was added (pipeline_started_at
), this was needed to partition the results of the scheduled daily pipeline runs. Unfortunately, when running the pipeline on a schedule, the timestamp is set at first pipeline run and stays the same on consecutive runs, which is not what we want.
After spending some time exploring various solutions (set the time as part of a pipeline job, set a default value in BQ), this seemed like the right solution:
- Remove the timestamp creation logic
- Replace the
pipeline_started_at
with thecreated_at
field - Set the
created_at
timestamp for each record while writing to BQ
Alternatively see dev-ai-research-0e2f8974.duo_chat_experiments.aherczeg_add_timestamp__independent_llm_judge
How to set up and validate locally
- Run any pipeline as usual
- Once the pipeline finishes, check that the
created_at
column is populated as expected.
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Andras Herczeg