Make project management in Meltano "git aware"
Documenting here some ideas as discussed in Meltano-for-Meltano deployment discussion between @aaronsteers and @tayloramurphy.
Especially when inside a container, it could be extremely helpful if meltano were made git aware. Without this awareness, the Meltano deployment story is extremely difficult for a typical data developer to implement.
A three phase approach to this:
- The docker container has a bootloader script that pulls the project from Git as a first step.
- Meltano natively is able to push and pull prior to certain commands.
- The Meltano UI is able to push and pull to git, and will prompt you if you have "uncommitted changes".
Simplify the git workflow for Data Professionals out of the box
- When launching the
meltano/meltano
docker image, the image will detect the repo URL, credentials, and default branch. Assuming no other project as been mapped, it will download the project repo at launch time.- When needed,
meltano install
will also be run automatically after the project is cloned.
- When needed,
- Unless otherwise specified, we can assume 2 branches on every project repo:
main
(ormaster
) anddevelopment
. Projects start on thedevelopment
branch by default, creating it frommain
if it does not yet exist.- Optionally, a user name or environment name can be appended in the branch name:
development/aj
ordevelopment/web-ui
.
- Optionally, a user name or environment name can be appended in the branch name:
- Meltano will default to "auto-commit mode" for users not familiar with git, or for environments where we do not have direct interactivity with the developer. Within this mode of operation:
- Commits and pushes are triggered automatically against the development branch if files are changed by a
meltano
CLI command. - Pulls are triggered automatically before running
meltano elt
and before modifications to the project. - The repo is automatically in "read only mode" whenever on
main
. In this mode,meltano
CLI commands will fail (or prompt for a branch change) if they would modifymeltano.yml
.
- Commits and pushes are triggered automatically against the development branch if files are changed by a
Support for advanced scenarios as teams and requirements evolve
The above would be a default experience for new teams. For advanced teams and for highly tuned environments:
- Auto-commit mode can be disabled.
- The list of protected / read-only branches can be expanded beyond just
main
. - Specific branches can be checked out.
- The native
git
executable still works within the repo as usual, since git operations performed bymeltano
also are using the same standardgit
operations. - Each container can have customized environment variables specifying which set of branches it expects to be run against, or any other constraints such as forcing read-only mode.
In these examples, meltano project
is similar in behavior to comparable git
commands, except that additional behaviors and constraints are applied as make sense for meltano projects specifically.
meltano project pull # Pulls from the repo. URL and creds are in env vars or meltano.yml
meltano project commit # Check branch rules; commit and push if safe, otherwise throw error.
# A default commit message will be provided if none given.
meltano project checkout <BRANCH> # Switches between branches
A sample Deployment Story
A possible kubernetes workflow would then be:
Project initialization:
- Developer creates copies from our new project template, or pushes the output of meltano init.
- Developer creates and auth token in Gitlab/Github if they don't have one already.
- Developer maps their project git URL and auth token into environment variables for docker-compose, kubernetes, or similar.
Project deployment:
- Containers starts up.
- Some command is passed to the container
meltano install/init/elt
. - Meltano detects from env vars the git settings.
- Meltano pulls the latest.
- The original command is run (probably either
meltano ui
ormeltano elt
). - Whenever
meltano.yml
or other files are changed, meltano attempts to commit and push back.
Streamlined deployment story:
- With proper env var config, the stock
meltano/meltano
docker image can be run directly from ECS, Kubernetes, or docker-compose, and only requires env vars for git project initialization. - The project running in the container does not need to be set in read-only mode, but instead can default to auto-commit mode (unless or until the user sets it otherwise).
- After finding the repo from git, the image can also auto-install all the components it needs.
- As needed to decrease initialization and install time, eventually we still expect users to create their own Dockerfiles.
Web UI opportunity down the road:
- If
meltano ui
is executed without a defined project, the Web UI could wait at a project initialization screen, asking a user to input project details and then initialize from user input.