Auto-generate dbt sources.yml from Extractor/Loader schema
Problem to solve
When a user first configures DBT in a Meltano project, there is some boilerplate work required to get the DBT project set up correctly. Some of this config could be generated by Meltano, so that more things "just work" without requiring manual config.
Target audience
First-time Meltano user.
Further details
When setting up the DBT Transformer, an initial dbt_project.yml
file is generated which links up the skeleton project structure (e.g. specifying source-paths
), but currently the configuration of the DBT project is left largely to the user. One useful step when setting up a DBT project is to define a sources.yml
file that defines tables and columns for the source data, which can then be referenced elsewhere in the DBT models.
https://docs.getdbt.com/docs/building-a-dbt-project/using-sources/
Proposal
Given that Meltano has the schemas in hand, it should be possible to generate a sources.yml
file for the DBT project. For a new Meltano user with a large schema to import, this could save a lot of tedious work.
Open question: how to handle schema changes? Perhaps the autogeneration would be better as a manual job that generates in a well-known location, so that the operator can trigger an update, inspect the deltas, and then commit the new generated sources. Alternatively this could be generated dynamically on each run; there may be complications introduced by the dynamic approach, as DBT errors would be thrown any time something doesn't match.
This would mean you could write a new mymodel.sql
like:
with source as (
select * from {{ source("my-db", "existingmodel") }}
),
mymodel as (
...
)
select * from mymodel
What does success look like, and how can we measure that?
User can start a new Meltano project, extract a schema from their existing DB, and then write a DBT model referencing their existing schema without having to write any boilerplate.
Regression test
(Ensure the feature doesn't cause any regressions)
-
Write adequate test cases and submit test results -
Test results should be reviewed by a person from the team
Links / references
Perhaps similar conceptually to #2221 (closed) ?