Go all-in on Meltano as an open source self-hosted data integration pipeline platform
The thought process: - Meltano's end-to-end vision depends on teams coming to see it as an alternative to Fivetran and Looker (and similar tools) - Part of Meltano Analyze's competitive advantage over Looker is the fact that it's free and open source, but so are Redash, Superset, and Metabase, which are already far more advanced (although they currently lack the concept of data modeling to allow easier analysis by non-technical people) - Meltano Analyze's competitive advantage over those is that it comes with built-in ELT and out of the box support for various data sources: it can take you all the way from data to dashboard. - If a user doesn't intend to use Meltano ELT with Meltano Analyze, there is little reason for them to (ever) use Meltano Analyze - Meltano ELT's competitive advantage over Fivetran e.a. is that it's self-hosted, open source, and free - For Meltano ELT to actually be an alternative with Fivetran e.a., it needs to have support for a lot of data sources and be easily deployed into production - Meltano data source support starts with extractors, and can later be augmented using transforms (DBT) and models (m5o) - The largest current ecosystem of open source extractors are those built around the Singer.io tap specification developed by Stitch Data - Many of these taps exist already, but quality varies a lot - To increase Meltano's data source support, existing taps need to be improved and new taps need to be developed - Doing this ourselves doesn't scale and the maintenance work would be neverending, so (by the time our vision is realized) these will need to be developed and maintained primarily by "the community" at a rate that can compete with the in-house teams at Fivetran e.a. - A Singer community exists, but is not very large or active - Growing the Singer community is not currently a priority to Stitch, presumably because to them, Singer is just their "Stitch extension framework", rather than something supposed to stand on its own - For the Singer tap community and ecosystem to grow, it needs to be easy to develop extractors and use them outside of Stitch - Documentation around developing and using Singer taps exists and is a great start (see https://github.com/singer-io/getting-started) but leaves a lot to be desired for people new to the technology - Little tooling around tap development, testing or pipeline deployment exists, which leads to many people interested in using Singer outside of Stitch giving up - This means that until the state of documentation and tooling around Singer taps and targets improve, we cannot expect the Singer community and ecosystem to grow, or Meltano's data source support to improve - To address this, Meltano ELT could become that deployment and development tooling, and we can write great documentation - To make sure we build the most useful tool for the community, we need to do it _with_ the community - The Meltano open source community can be started from inside the existing Singer community, and as the one grows, so would the other. Even a Singer tap developer who wants to use it with Stitch instead of Meltano ELT could become a member of the Meltano community for its tap development tooling and documentation - To attract members from the existing Singer community to the Meltano community and get them to invest time into this project, they need to feel like our (GitLab's) goals are aligned with theirs and that we're in this together. - This means that our goal needs to explicitly be to "**build a production-ready self-hosted open source alternative to existing proprietary hosted ELT/data pipeline solutions with a great library of extractors**", and (for the time being) _not_ to "build an end-to-end tool for data engineers, analytics engineers, analysts and non/less-technical end-users" or to "build a product that can make GitLab money". - The first members of the community will be existing Singer community members, people at companies who have already decided they want or have to use an open source ELT despite its limitations (lack of data source support and production-ready tooling, mostly), as well as people new to Singer and Meltano who know their own company will realistically not be able to switch from existing hosted solutions for a while, but who are nonetheless personally motivated by making ELT tooling a commodity (in a way, this is what GitLab is doing by funding development of Meltano ELT) - As the product improves over time, it will become increasingly interesting to teams and companies considering a switch from existing hosted solutions, and a point will come where GitLab will see an opportunity to monetize Meltano ELT (through SaaS, enterprise functionality, support, etc). Until we reach that point, I think approaching Meltano like a business and money making opportunity would do the community and product more harm than good - Similarly, there may come a point at which we realize that Meltano ELT has now become good enough that Meltano Analyze's "built-in ELT" will actually become a selling point, at which point we can focus more of our efforts there, and explicitly position Meltano as both open-source ELT, _and_ open-source Analyze, in one package. - During our focus on Meltano ELT, I expect Meltano Analyze to get relatively little interest, since ELT contributors will be data engineers, not data analysts or end-users who would want to use Analyze themselves. Some ELT users may still give it a try though, and find that it's good enough for their basic personal use cases, like using Meltano to visualize data from an app like Goodreads, MyFitnessPal, or Strava. They may even demo it to an analyst colleague and make minor contributions. I don't think we (as GitLab) should focus on Analyze and data analysts until ELT is good enough to support it, though. Conclusions: - We will go all-in on building a great self-hosted open source ELT tool and will treat the end-to-end vision just like the early GitLab contributors would have treated the "entire DevOps cycle" vision - From GitLab's perspective, Meltano's success will be measured based on contributions first and usage next, until real-world production usage has increased to a point where it's clear that Meltano ELT can truly compete in the market with hosted proprietary tools, at which point GitLab will attempt to build a business around and supporting the existing open source community and product. This point is expected to be multiple years away from today.
epic