Custom mappings in the SDK for targets, including renaming and basic expressions
@DouweM demonstrated in Demo Day on 2021-03-26 a fully serverless data pipeline running on Gitlab CI and the "reverse ETL" or "ETL-P" (P for publish) approach. The demo showcased a column-name remapping setting and I'd like to consider making that a standard feature in the SDK.
I also think it would be valuable as we are implementing this spec if we had a simple expression language to use for small conversions. For instance, if a target expects data rounded to a specific decimal point, or in a specific data format, a field concatenation or string split into multiple fields.
We can use DBT for this, but then we have to land the data locally and transform it.
- There are use cases where it would be much more efficient to stream directly from a source to a target without landing it first in the data repository.
- There are also small tweaks and customizations that a SaaS target may require, which don't in themselves warrant a dedicated DBT transform.
SimpleEval
One option which is built for safety and looks robust enough for these types of inline transformations is the simpleeval
library: https://pypi.org/project/simpleeval/
Examples from the website:
Simple math:
>>> s = SimpleEval()
>>> s.eval("1 + 1")
2
>>> s.eval('100 * 10')
1000
Variables and nested property access:
>>> simple_eval("foo.bar", names={"foo": {"bar": 42}})
42
Applications for Targets
Given an input row and a mapping dict, we could either append the contents of record with the provided mapping transformations, or we could replace the entire record dict with the output of those transformations. (This could be determined by the target developer.
Applications for Taps
This could be an alternative to the interim pipeplinewise-transform-field plugin, if provided to a tap that supports it, certain transformations could be expressed directly by the user in config as post-preocessing mapping transformations. When provided, these transformations could nullify or hash records without having to pass the data through a separate process.
Applications for Meltano config
It was mentioned in our post conversation that there may be a similar issue or feature request for meltano.yml config, allowing users to perform basic transformations within the yaml expressions similar to how environment variable expansion today allows them to pass in variables from the environment rather than hard coding them.