Singer SDK: New framework for building Singer taps
The Singer SDK is in development at https://gitlab.com/meltano/singer-sdk. V0.1.0 milestone: https://gitlab.com/meltano/singer-sdk/-/milestones/1
I think we can make it easier for people to build and maintain high-quality taps that they and others can confidently run in production, by building a new "convention over configuration" framework on top of https://github.com/singer-io/singer-python that will reduce as much friction as possible to correctly and consistently implement recommended but optional features like discovery mode, stream/field selection, incremental replication, and metrics, as well as best practices like handling rate limiting, logging, and error handling.
Taps today contain a lot of boilerplate code around https://github.com/singer-io/singer-python, which starts out all right the moment a new tap is created from the singer-tap-template
, but spirals out of control as taps start to implement (some of) the optional features listed above, without a clear example of how to do so most effectively and correctly. As a result, the implementations of different fully-featured high-quality taps look very different, even though they implement much of the same behavior using the same underlying library.
As a result, it can be daunting for people new to Singer to implement new high-quality taps or contribute to existing ones, because with every new optional-but-recommended feature they want to implement, they are left to figure out by themselves how to do so within their existing code base, without a clear example to follow as none of the existing taps have gone about it the same way, and none of their solutions fit directly in what we already have.
A common framework that moves more of the boilerplate behind the scenes will resolve this, and will prevent bugs that don't derive from tap-specific implementation details but rather from how taps implement the Singer spec or use the singer-python library.
Along with this framework, we can provide an alternative to https://github.com/singer-io/singer-tap-template (#2452 (closed)) that should require less code to reach the same point, and will present a clear path towards adding support for additional Singer features with minimal effort beyond the source-specific business logic. This new template would not have the AGPL license (unlike the existing one), and would give tap authors the option of using a more permissive license like MIT or Apache.
If the framework lets taps describe their supported settings and capabilities (discover
, state
, etc) in a consistent way, it can also make it easier to generate tap-specific metadata for https://github.com/aaronsteers/singer-db and Meltano's discovery.yml
, or for runners to discover this metadata by invoking the tap in a certain way.
Along with the framework, we can provide testing helpers to be used with unittest
or pytest
, but in many cases having tap-specific tests may not be necessary when more of the tricky logic is moved into the framework.
I don't have specific ideas yet with regard to what this framework, or taps written using it, will look like, so I'm especially interested in hearing from people who have written fully-featured Singer taps before using https://github.com/singer-io/singer-tools and possibly based on https://github.com/singer-io/singer-tap-template, who have a good grasp of the best practices (from https://github.com/singer-io/getting-started/blob/master/docs/BEST_PRACTICES.md and in general), are familiar with the range of functionality that taps for various kinds of data sources (APIs, databases, files) require, and have their own ideas on what a unified approach to building better taps would look like.
The goal is to work together on building the framework and supporting tooling and documentation that you will want to use for your next tap, and that you'll want to migrate your existing taps over to.