Revert meltano projects
Revert Meltano Projects? Why?
We implemented Meltano Projects at the file-system level: each project would
live in its own folder and declare its dependencies/runtime in a
We changed the way
meltano ui works, to boot it outside a Meltano project's
context, and inject the context using a project slug everywhere it's needed.
I've been working on the code base and it occurred to me that we added an extra layer of complexity that is not needed to obtain the goal: separating the concerns of different groups within an organisation.
In the spirit of YAGNI, I think that this feature was prematurely integrated and the it hinders the development of further features in Meltano.
Meltano projects are composed of multiple components, which can described as such, following the MELTANO acronym:
- Models: definitions of the data
- Extractor: runtime to extract from the data source
- Loader: runtime to integrate data into the database
- Transformer: runtime to tranform data inside the database (dbt)
- Transforms: definitions of the transforms
- Analyze: definitions of Reports and Dashboards
- Notebook: runtime to run arbitrary Kernels on the database (Jupyter)
- Orchestrator: runtime to orchestrate and schedule jobs across components (Airflow)
The current separation basically puts everything in a project, then run the Meltano UI outside it.
I believe the segmentation should happen at the
definitions level, upon a same runtime, but I
also think we should not be doing that now.
Any plugin/integration that requires a worker will have to be duplicated for all the projects currently running. We currently have to way to start/stop a plugin, and no way of mapping a project to its running services. This is happening right now for the case of Airflow. We need to start the webserver & scheduler but how can we do that if we don't know which project needs it?
We need a single source of truth, and now we lost it.
This is true for Airflow, but will be true for any other integration that has a runtime/worker process: Metabase, Redash, Jupyter, etc…
Furthermore, even if we knew what project has Airflow installed in it, it would be a mess to spawn the Airflow scheduler for each on a different port and do the reconciliation after.
Simply put, the
meltano ui/start webserver needs to run alongside any required workers, so they can
be started at the same time.
What is the solution
meltano uito what it was initially defined to be: if it's not run in a project, then open an UI to create a project (basically a "meltano init" ui), cd into it, then restart meltano ui inside it.
meltano uishould most probably be
meltano startanyways, because it actually "starts" all needed processes for your meltano project to work.
Backlog the project segmentation feature and implement it only upon the definitions aspects of the meltano project: (Models, Reports, Dashboards, Transforms). How to do that should be defined in another issue.