Revert meltano projects
Revert Meltano Projects? Why?
We implemented Meltano Projects at the file-system level: each project would
live in its own folder and declare its dependencies/runtime in a meltano.yml
file.
We changed the way meltano ui
works, to boot it outside a Meltano project's
context, and inject the context using a project slug everywhere it's needed.
I've been working on the code base and it occurred to me that we added an extra layer of complexity that is not needed to obtain the goal: separating the concerns of different groups within an organisation.
In the spirit of YAGNI, I think that this feature was prematurely integrated and the it hinders the development of further features in Meltano.
Meltano projects are composed of multiple components, which can described as such, following the MELTANO acronym:
- Models: definitions of the data
- Extractor: runtime to extract from the data source
- Loader: runtime to integrate data into the database
- Transformer: runtime to tranform data inside the database (dbt)
- Transforms: definitions of the transforms
- Analyze: definitions of Reports and Dashboards
- Notebook: runtime to run arbitrary Kernels on the database (Jupyter)
- Orchestrator: runtime to orchestrate and schedule jobs across components (Airflow)
The current separation basically puts everything in a project, then run the Meltano UI outside it.
I believe the segmentation should happen at the definitions
level, upon a same runtime, but I
also think we should not be doing that now.
Problems
Any plugin/integration that requires a worker will have to be duplicated for all the projects currently running. We currently have to way to start/stop a plugin, and no way of mapping a project to its running services. This is happening right now for the case of Airflow. We need to start the webserver & scheduler but how can we do that if we don't know which project needs it?
We need a single source of truth, and now we lost it.
This is true for Airflow, but will be true for any other integration that has a runtime/worker process: Metabase, Redash, Jupyter, etc…
Furthermore, even if we knew what project has Airflow installed in it, it would be a mess to spawn the Airflow scheduler for each on a different port and do the reconciliation after.
Simply put, the meltano ui/start
webserver needs to run alongside any required workers, so they can
be started at the same time.
What is the solution
-
Turn back
meltano ui
to what it was initially defined to be: if it's not run in a project, then open an UI to create a project (basically a "meltano init" ui), cd into it, then restart meltano ui inside it. -
meltano ui
should most probably bemeltano start
anyways, because it actually "starts" all needed processes for your meltano project to work. -
Backlog the project segmentation feature and implement it only upon the definitions aspects of the meltano project: (Models, Reports, Dashboards, Transforms). How to do that should be defined in another issue.
/cc @dmor @jschatz1 @valexieva @derek-knox @iroussos @bencodezen