Catalog Generation Strategy - Schema Change Detection
Want to be able to diff my currently used catalog, to the catalog generated from catalog-generation because Source Data changes schema without the control of whoever is extracting data from the source system.
When a change happens I may want to
- Decide which dbt models need to be updated
- Notify down stream Business users of the change
- etc
Easiest to explain by showing what you'd have to do today to accomplish this I believe
- Setup your meltano job to work the way you'd like
- name: tap-oracle
namespace: tap_oracle
pip_url: git+https://github.com/transferwise/pipelinewise-tap-oracle
executable: tap-oracle
metadata:
'*':
replication-method: FULL_TABLE
select_filter:
- '!MDSYS-DBA_SDO_THEMES'
- '!CTXSYS-DRV$DELETE2'
- '!OLAPSYS-ALL*'
- Take the catalog from this generation by running
meltano invoke tap-oracle --dump=catalog > catalog.json
- Run your job using a definition like this
- name: tap-oracle
namespace: tap_oracle
pip_url: git+https://github.com/transferwise/pipelinewise-tap-oracle
executable: tap-oracle
catalog: catalog.json
- Write a script that runs 2. above (with a different catalog name) and "diffs" (Using something like https://pypi.org/project/deepdiff/) it with the catalog.json file used in 3. above. Alert my team via Slack/Teams/Email or something about the difference, and then decide if we want to do anything about it or not.
Instead of writing that script out it would be more ideal if you could define this strategy in the meltano YAML configuration in 1. above.
Something like
- name: tap-oracle
namespace: tap_oracle
pip_url: git+https://github.com/transferwise/pipelinewise-tap-oracle
executable: tap-oracle
metadata:
'*':
replication-method: FULL_TABLE
select_filter:
- '!MDSYS-DBA_SDO_THEMES'
- '!CTXSYS-DRV$DELETE2'
- '!OLAPSYS-ALL*'
catalog_generation_strategy: OVERWRITE(Default)|FAIL|WARN
This leads to other questions and concerns
- Is generationStrategy the right strategy?
- Specific EXIT code for this type of failure? That way your orchestrator could handle the alerting
I think this is a good starting point, hopefully there's a good idea in here somewhere!
Edited by Derek Visch