Skip to content

Replace pipeline_started_at with created_at

Andras Herczeg requested to merge ah/fix-timestamp into main

What does this merge request do and why?

In a previous MR, a new field was added (pipeline_started_at), this was needed to partition the results of the scheduled daily pipeline runs. Unfortunately, when running the pipeline on a schedule, the timestamp is set at first pipeline run and stays the same on consecutive runs, which is not what we want.

After spending some time exploring various solutions (set the time as part of a pipeline job, set a default value in BQ), this seemed like the right solution:

  • Remove the timestamp creation logic
  • Replace the pipeline_started_at with the created_at field
  • Set the created_at timestamp for each record while writing to BQ

Alternatively see dev-ai-research-0e2f8974.duo_chat_experiments.aherczeg_add_timestamp__independent_llm_judge

How to set up and validate locally

  1. Run any pipeline as usual
  2. Once the pipeline finishes, check that the created_at column is populated as expected.

Merge request checklist

  • I've ran the affected pipeline(s) to validate that nothing is broken.
  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Andras Herczeg

Merge request reports