Provide developer option for auto-detecting schema at runtime or devtime
Summary
As demonstrated in the new tap-rest-api, there's an opportunity to auto-detect a stream's schema. This could be leveraged at runtime or devtime, depending on the needs of the developer and of the tap API itself.
Proposed benefits
- at runtime: for fully dynamic schemas
- at devtime: to semi-automate the schema declaration process
Proposal details
For runtime use cases (dynamic schema):
- Add an auto-detection method (with an integer record-count arg) which can be used by developers within the discovery methods.
For devtime use cases (static schema):
- Add a helper CLI, pytest output artifact, or other means which can auto-generate (a) JSON schema file definitions or (b) Python sample code.
For both use cases:
- Update the cookiecutter and
code_samples.md
with means of implementing auto-schema detection.
Best reasons not to build
This method only works (a) if a reliable and stable schema can be inferred within n
number of records per stream, and (b) if the schema detection methods can reliably detect enough metadata to still provide a high-quality experience for the Singer community.
We probably should plan ahead to build an expectation that review and appending metadata is an important part of the process. We may need to invest in a streamlined path and instructions on how developers can tweak, amend, and append the generated schema with more fine-tuned data types and other metadata - such as property descriptions as noted in #159 (closed).
Refs:
- As inspired by
Widen/tap-rest-api
: https://github.com/Widen/tap-rest-api-msdk - and: https://github.com/anelendata/tap-rest-api