No state stored for pipelines when Transforms run
Edit2: Title updated again to reflect the new findings
Bug Summary:
When pipelines run using the --transform run
flag, the state of the last run is not saved.
This is a consistent behavior for both the CLI and Meltano UI.
When a pipeline runs with --transform skip
, both using the CLI and Meltano UI, the state is properly saved and used in following runs of the same pipeline (same job_id)
Edit: Title updated to reflect the new findings
Check my comments bellow for a detailed analysis of the issue.
Summary: When pipelines run using Meltano UI, the state of the last run is not saved.
Bug: When the pipeline runs again there is no state to be used and the pipeline starts from scratch.
This affects only ELT pipelines run using Meltano UI. When running meltano elt ...
in the CLI, everything works without issues.
Correct Behavior: When a pipeline finishes successfully, the state should be saved and utilized in the next run.
Original Title: Potential bug - No state stored for recurring pipeline for Stripe
Original description:
Where: https://holloway.meltanodata.com/
Extractor: Stripe
Pipeline Interval: @hourly
6 jobs have run at the moment for the same pipeline, but all followup jobs report to find no STATE: No state was found, complete import.
Tap Stripe supports states and I can see from discovery.yml
that we have enabled that capability:
- name: tap-stripe
label: Stripe
pip_url: 'git+https://github.com/meltano/tap-stripe.git@v0.2.4'
capabilities:
- catalog
- discover
- state
This could be one of various possible bugs:
- Our mechanism for storing and fetching STATE has broken during one of the recent updates (e.g. the pipeline id is not properly used)
- Something is not working properly with Stripe specifically
- Something is broken for recurring pipelines (we do not add correctly the run settings when adding the job to Airflow?)
We should investigate this and make sure that we properly track and use the STATE of previous pipeline runs.