Using Singer Taps/Targets in Meltano
This is a discussion issue documenting the pros and cons of using Singer.io Taps/Targets as Meltano's core Extractors and Loaders. It will be updated as we are going through the process of checking Singer's Protocol and supported Extractors/Loaders.
Info on Singer Taps/Targets
To start with, Singer's Communication Standard seems like a solid one, providing a simple way to create Extractors (Taps), Loaders (Targets) and sending data and schema definitions from Extractors to Loaders. I am not 100% sure how extensible and scalable it is for large chunks of data, as it uses stdout/stdin and JSON to pipe the data, but it seems like a good starting point.
Some overall notes to start with:
-
There are 3 types of Taps/Targets available:
- Those maintained by Stich and provided through their platform (labeled as Stitch-certified in the Stich Platform). Those should work perfectly fine and without issues.
- Those provided in one of Singer.io's Repositories and also offered through the Stich Platform. They should be at least stable and should work without issues.
- Third party Taps/Targets and those provided in one of Singer.io's Repositories but not promoted through the Singer.io page or the Stich Platform. Those could range from unstable to totally broken.
-
There is poor documentation for most not officially certified taps that I checked. A lot do not even have a README and most of them require you to check their code in order to even understand the available configuration options. Most Taps/Targets are still a work in progress with their definitions changing and new features not always reported in their documentation.
Check for example the Tap for extracting data from Zendesk which is certified and promoted by Stitch.
-
The Singer.io community is not as active as I would expect. As an example, the tap used as the main example in Singer.io homepage (tap-fixerio) is broken (
fixerio 404 - PLEASE UPDATE YOUR API ENDPOINT
), it has not been updated and the issue reporting the problem has not been addressed since June 18th. -
Not all Sources/Loaders available in the Stich Platform are openly available.
That's really obvious in the Loaders (Destinations) Section. There is no Target for Postgres or Snowflake in the Singer.io's Repositories, nor is there any mention by Stich on what they are using in general for loading data to the Target Data Stores. We'll have to create our own Targets or use third party, less used/tested implementations, like the one available for Loading data to Postgres.
-
As with the initial Extractor Implementations in Meltano, most Taps maintained by Stich (e.g. SFDC, Zendesk, etc) serve the needs of the Stich Platform.
They are pretty complete and support most of the Entities available by those APIs, but they are built to work in conjunction with a web platform like Stitch and less as stand alone extraction tools to be run in isolation.
That means, for example, that whenever available, an OAth Authentication flow approach is used as the only option to authenticate the user and connect to the API. That approach requires a server with a static IP at least and a user authenticating through a browser.
This is in contrast to the current Meltano approach of using the Authentication option that directly utilizes a username/password and/or an API token. We have to think about the best approach to follow, but for sure our approach allows for an extractor to run even from a data scientist's home laptop as long as she has the credentials to connect to an API, while the OAth Authentication flow approach would require a web server and an interface similar to Meltano Analysis for generating and storing the tokens to be used while running the Taps.
What's more important, as I mentioned in the beginning of this section, is that the approach by Stich is to only support the option that is of interest to the Stitch Platform, even when there are other options available (e.g. in the SFDC and Zendesk Taps and APIs that I checked). That means that if we want those taps running as stand alone extractors we'll have to either contribute to their Tap code base and add additional login options (hoping that those MRs are merged) or fork the Tap implementations and maintain our own version for most Taps.
This is the most important hurdle I found while investigating some of the high profile Taps, so I'll continue my analysis and elaborate more on that bellow with examples per Tap.
Notes on specific Taps/Targets
SFDC Tap
Stitch-certified - link to tap-salesforce
This Tap requires:
- An active Salesforce Connected App (in order to acquire a client_id and a client_secret)
- A server in order to use Salesforce's Web Server OAuth Authentication Flow and acquire a valid refresh_token
Unfortunately, it does not support the Username-Password Authentication Flow that we currently use in the Meltano SFDC extractor and in all other extractors we currently maintain.
As I wrote in the previous Section, that raises the bar in order to use this Tap for extracting data from SFDC, as whoever is going to be the end user has to also somehow acquire the OAuth token (refresh_token). That is in comparison to the less secure, but better suited for a CI/CD pipeline environment of using a username, a password and a secure token for authenticating with Salesforce.
One solution in the case of running Meltano as a standalone app would be to provide the OAuth Authentication Flow through the Mentano Analysis web interface and (securely) store the refresh_token in Meltano Analysis DB.
But that approach does not solve the requirement that the user running the Meltano Project should have at least a static IP, as Salesforce's Web Server OAuth Authentication Flow requires a redirect_uri in order to send back a temporary token (after the user authenticates in Salesforce) that will be then used in order to fetch the permanent OAuth token (refresh_token). This is more or less a standard flow when using OAuth Authentication.
Another option would be to update the SFDC Tap and add an alternative login method (through username, password and secure_token):
- That should be OK as long as the access tokens fetched through the two authentication options (using an OAuth refresh_token and user/pass creds) are the same and can be used interchangeably in the followup calls (so that we will be able to make as few updates to the SFDC Tap as possible).
- What I am not sure about is how open they are to accepting contributions to Stitch-certified Taps in case we follow that path.
Zendesk Tap
Stitch-certified - link to tap-zendesk
This Tap has no documentation at all, not even on the configuration options in order to run it.
By checking the Tap's code, I can see that they require the following:
REQUIRED_CONFIG_KEYS = [
"start_date",
"subdomain", ==> XXXXX in https://XXXXX.zendesk.com/api/v2/
"access_token" ==> the OAuth token for accessing Zendesk.
]
Once more, only Authentication by using an OAuth access token is supported and not the option of Authenticating by using an API token like our current approach.
As with SFDC, that approach is best suited for a web platform like Stich and not for running an extractor in stand alone mode. It requires the user to:
- Have an active, registered Application with Zendesk
- A server in order to use Zendesk's OAuth authorization flow and get an access token.
The limitations and options are the same as the ones discussed in the SFDC Tap Section.
They use Zenpy in order to connect to Zendesk, which has support for all types of authenticating with Zendesk, so it should not be difficult to update Zendesk Tap's code if we follow that path.
Marketo Tap
Stitch-certified - link to tap-marketo
It uses the same CLIENT_ID/CLIENT_SECRET authentication method as Meltano's Marketo Extractor.
I was able to set it up and request Marketo's schema (schema discovery mode), but when I run it in extraction mode the Tap terminates immediately without any message shown, info returned, log created.
We should check this more and find out what happens there, but this is one of the cases where an officially supported Tap does not work without any info returned (in order to debug the process). It could be that we have reached the daily quota or that something is missing from my config, but I can not know without any feedback returned.
If we are going to continue checking the option of using Singer.io Taps/Targets in Meltano, someone should check this one more, find a way to debug it, contact the Singer.io community through Slack, etc.
Zuora Tap
Stitch-certified - link to tap-zuora
This Tap has no documentation at all, not even on the configuration options in order to run it.
By checking the Tap's code, I can see that they require the following:
REQUIRED_CONFIG_KEYS = [
"start_date",
"api_type",
"username",
"password",
]
That makes the Zuora Tap a good candidate for testing a Tap and comparing it with what we currently have.
Stripe Tap
Stitch-certified but no link to a public repository from Stitch or Singer.io
Like with a couple other Taps/Targets (namely the Postgres one), there exists a single public repo of a Stripe Tap provided by Statsbot: link to tap-stripe
This Target has no documentation at all, is not verified by Stitch or the Singer.io community, so we should carefully test it and check its quality before using it.
Gitlab Tap
Community-supported - link to tap-gitlab
A tap that can be used for testing things out, especially Targets (Loaders).
I set it up without issues and I was able to fetch relevant data from the meltano/meltano project.
Postgres Target
There is no link to a public repository for a Postgres Target from Stitch or Singer.io
There exists a single public repo of a Stripe Tap provided by Statsbot: link to target-postgres
This Target has no documentation at all, is not verified by Stitch or the Singer.io community, so we should carefully test it and check its quality before using it.
Snowflake Target
There is no link to a public repository for a Snowflake Target from Stitch or Singer.io, nor could I find a target for Snowflake in general.
Comments on other Taps/Targets
Taps provided in Stich with no open source project
Taps that require an OAuth Token:
Options on how to use Taps/Targets inside meltano
(WIP)(ToDo) Let's discuss this first and then I am going to add more info on that.