Publish Meltano core as an importable Python library
Problem to solve
The functionality of Meltano in being able to wire together Singer taps and targets, track their state, and expose metrics about their execution is highly valuable. There are use cases where it would be helpful to be able to leverage those capabilities without also depending on the orchestration and UI functionality that is present in the Meltano project as it currently exists. As the core functionality is already encapsulated in its own module within the project it is worth exploring what is involved in extracting and publishing that portion of Meltano as a standalone library.
My initial use case is for being able to wrap Meltano's capabilities in a plugin for the Dagster data orchestration framework, though I'm sure there are many other elements of the broader data management ecosystem which could benefit.
Target audience
Data engineers and builders of other data ecosystem tooling.
Further details
My immediate goal is to be able to build a Dagster plugin that allows for executing extract and load workflows as part of a broader pipeline definition. The broader goal is to allow other tools to be able to easily add robust extract and load functionality without having to reinvent the wheel.
Proposal
There are a couple of ways that we can attack this.
One option is to extract the core as a separate project and then add it as a dependency to this Meltano project. That enforces clean interface definitions between the two projects and separates contributions between the core library and the UI/UX elements of the project.
Another option is to use something like the Pants build tool to treat this repository as a monorepo with multiple package builds. This eliminates the step of extracting the code to a separate repository and adding additional steps to the development workflow. It also ensures that new functionality can be added and tested across the core/UI/API/CLI boundaries with atomicity.
What does success look like, and how can we measure that?
Success would be measured by the ability for other projects to leverage the robust integration, execution, and state tracking of Meltano EL pipelines without having to adopt the UI/CLI interfaces and the additional dependencies that they require.
Regression test
(Ensure the feature doesn't cause any regressions)
-
Write adequate test cases and submit test results -
Test results should be reviewed by a person from the team
Links / references
Please note that this was taken from GitLab, to be changed accordingly