MVC
Jacob: After talking with @joshlambert we came up with this MVC. The layout is the file structure as well. So for example, top layer you would have extractors
.
Repo Directory Structure
- Extract
- Lever (Done)
- Mapping/Filtering
- SFDC (Done)
- Mapping/Filtering
- GitLab (Done)
- Mapping/Filtering
- BambooHR (Done)
- Mapping/Filtering
- Zuora (Done)
- Mapping/Filtering
- NetSuite (Done)
- Mapping/Filtering
- ZenDesk (Done)
- Mapping/Filtering
- Fastly (in progress)
- Mapping/Filtering
- CSV (todo)
- Mapping/Filtering
- Lever (Done)
- Load
- Postgresql
- CSV (not MVC)
- BigQuery (not MVC)
- MySQL (not MVC)
- SnowFlake (not MVC)
- Anonymization/Pseudoanonymization step
- Transform
- dbt transformations (just files)
- python files (for example for API lookup
- Model
- melt files
- Analyze
- source files of flask application
- Orchestrate
- .gitlab-ci.yml files, the ones we use ourselves but also 20 other samples and examples
Meltano analyze will search the extractor directory for a list of extractors. If will then look in the load
and orchestrate
directories for a directory of the same name to find the corresponding steps in the life cycle.
Currently we have two types of extractors which will eventually become 1. To make an MVP happen sooner we will use both extractors. Extractor type 1 is a stand alone extractor which will be used with it's corresponding loaders. Extractor type 2 is the original extractor which contains a loader built in. Those will be labeled in the directory as extractor_name__legacy
which will be able to be run by Meltano analysis.
UX
Meltano analysis will crawl these folders and the tabs will be
MELTANO
- Model
- Look for Melt files
- Exist for the visualizations
- Extract
- Look for extractor files
- Be able to run extractors from UI
- Load
- Loader files
- Be able to run Loader from UI
- Demonstration load: CSV -> PG
- Transform
- Look for DBT files.
- Analyze
- Charts and tables in the UI through melt files.
- Orchestrate
- Run a real ELT from the console (not MVC: requires credential entry, etc.)
- List of Gitlab CI YAML files (.gitlab-ci.yml)
Todo's
- Python Implementation of this
- Architecture step by Alex Z
- Docker & Helm chart
- Base Dockerfile with no samples (meltano:base)
- Helm chart to deploy alongside Postgres
- Automate schema creation if it doesn't exist
Customer installation:
Simple use case
- Helm chart to easily deploy to k8s
- Provision prostgresql
- Provision meltano
BYO MeltML/Transforms
- Make your own Dockerfile
- FROM meltano:base
- ADD whatever files you want
- Edit chart's
values.yaml
to use your image
Personas
Data Engineer persona
Installation & Getting started:
-
Clone the "getting started" repo from GitLab. This repo includes only: default melt files, dbt transforms, a
.gitlab-ci.yml
(CI pipeline) and avalues.yaml
(Helm chart config). Also included is a README with instructions. -
First stage in pipeline is to start from the meltano docker image, then layer the whole repo on top.
-
Second stage in pipeline is to deploy Meltano to Kubernetes (k8s for now, other targets like VM's later).
Benefits:
- Customers do not have to fork the whole Meltano repo
-
In this case only the initial MeltML files and DBT transforms are cloned. This is ideal as these are likely to be changed anyway
-
Less files included means less worries about conflicts as we update the defaults.
-
- Simple mechanism to package custom files in the Docker image, as well as keep up to date with the melt files in the repo without having to build all the git pull/branch functionality first.
Data Analyst persona
They would have access to the cloned "getting started" repo that the engineer set up earlier. They can then edit the LookML or other dashboard files as desired, after merged a new container version will be built and pushed.
Developer persona
-
Developers can clone the full meltano repo locally, and follow the existing process we use to develop it. The main issue is that they will be unable to run CI, since they will have to fork the repo instead of branch. (Since our production variables are not protected yet.)
-
If they want to generate a test image, they can simply do a docker build.