New `meltano state` command to rename, alter, print, and copy job states
Problem to solve
Sometimes it's nice to be able to change the job_id
for a particular pipeline, perhaps for example a second version of a plugin is added (you started with tap-google-analytics
but now have tap-google-analytics-site1
and tap-google-analytics-site2
) and you need to disambiguate, or you're still figuring out naming conventions as we are.
If we just change it in meltano.yml
, Meltano will see new runs as a new job and not find the existing state.
Further details
As a workaround, we can go into the Meltano database and mess with the job
table, but this is annoying and error prone.
Proposal
I see a few directions here. We could add a command like meltano job rename <old_job_id> <new_job_id>
. meltano job
could also later include list
, which might be helpful for building a tap-meltano
in the future if you want to use Meltano to track pipelines. Alternate vocabulary could be runs
instead of jobs
as it seems like the codebase and docs are preferring that term.
Another way to do it would be to make a meltano state
command to manage state. It could have meltano state get <job_id>
, meltano state set <job_id> <state>
, meltano state reset <job_id>
(see Add command to reset state for a Job ID/schedule (#2568 - closed)). Then renaming could be something like meltano state get <old_job_id> | meltano state set <new_job_id>
. Pipe syntax would be handy if you wanted to use a tool like jq
to mutate states in complex ways.
Both of these options use a noun command as a verb command doesn't seem to make sense here. Currently Meltano seems to use a mix of verbs (discover
, init
, install
, upgrade
) and nouns (user
, schema
, config
).
Proposal Update (2022-01-31)
(Appended by AJ.)
To manage state migration, we'd add get
, set
, clear
, and copy
options a new meltano state
command.
-
list
-
meltano state list
(output list of available Job IDs to STDOUT)
-
-
set
-
meltano state set <JOB-ID> <state-json-text>
(input from text string or STDIN) -
meltano state set <JOB-ID> --input-file=<json-state-file>
(input from file) -
cat state.json | meltano state set <JOB-ID>
(input from text string or STDIN)
-
-
show
(orget
)-
meltano state show <JOB-ID>
(output to STDOUT) -
meltano state show <JOB-ID> > <json-state-file>
(output to file)
-
-
clear
(orreset
)-
meltano state clear <JOB-ID>
(ormeltano state reset
)
-
-
copy
meltano state copy <JOB-ID-1> <JOB-ID-2>
-
move
-
meltano state move <JOB-ID-1> <JOB-ID-2>
(ormeltano state rename
)
-
-
merge
-
meltano state merge <FROM-JOB-ID> <TO-JOB-ID>
(update job 2 state, replacing and merging status from job 1)
-
First iteration
As a first iteration, I think we'd just need items #1-3
, which are list
, set
, and show
- and optionally the fourth subcommand clear
. With those three or four subcommands, users could accomplish the other functions in a multi-step processes. For instance, instead of copy
, the user could run something like meltano state show <OLD-ID> | meltano state set
and accomplish the same effect. Similarly, a rename
operation could have the same workaround as copy
except also running a manual clear
after the copy.