Meltano UI converts hash keys in capital to snake case
This is not a bug per se, as this is a new option I introduced with tap-adwords
. But it blocks further support of tap-adwords
in Meltano UI, so we should address it some way or another.
Origin of the issue
In tap-adwords
, we need a more complex way of defining the value of a configuration parameter: Depending on the Streams (Entities) a user chooses to extract, you have to define a set of primary keys for each one.
Because the list of selected Streams and the list of primary keys for each one is dynamic and can vary in size, I have opted to define those with a value that is a hash of arrays.
This is pretty valid both in JSON (how Taps define their configuration options) and in YAML (how Meltano defines the configuration options for each Tap, which is then translated in JSON)
So, the newly added configuration parameter looks like that in JSON:
{
... ...
"conversion_window_days": 0,
"primary_keys": {
"KEYWORDS_PERFORMANCE_REPORT": ["campaignID", "adGroupID", "keywordID", "day"],
"AD_PERFORMANCE_REPORT": ["campaignID", "adGroupID", "adID", "day"]
}
}
While it is defined as follows in our discovery.yml
:
extractors:
- name: tap-adwords
label: Google Ads
description: Advertising Platform
namespace: tap_adwords
... ... ...
settings:
... ... ...
- name: primary_keys
label: Primary Keys
value:
KEYWORDS_PERFORMANCE_REPORT:
- campaignID
- adGroupID
- keywordID
- day
AD_PERFORMANCE_REPORT:
- campaignID
- adGroupID
- adID
- day
kind: hidden
description: Primary Keys for the selected Entities (Streams)
Problem (bug)
The aforementioned setup works without issues when the tap-adwords
is run using the CLI.
All the configuration parameters are converted correctly and passed to the Tap.
You can check what happens by running meltano config
:
(venv) Yanniss-MacBook-Pro:cli-tap-adwords iroussos$ meltano config tap-adwords
{
...,
'conversion_window_days': 0,
'primary_keys':
{
'KEYWORDS_PERFORMANCE_REPORT': ['campaignID', 'adGroupID', 'keywordID', 'day'],
'AD_PERFORMANCE_REPORT': ['campaignID', 'adGroupID', 'adID', 'day']
}
}
But when I tried to run the same Tap using Meltano UI (initialize a new Meltano project, run Meltano UI, and add and run the Tap from inside Meltano UI), I realized that something was off --> The primary keys were not added to the created tables.
I checked the configuration Meltano stores in this case and I got the following:
(venv) Yanniss-MacBook-Pro:tap-adwords-ui iroussos$ meltano config tap-adwords
{
...,
'conversion_window_days': 0,
'primary_keys':
{
'a_d__performanc_e__repor_t': ['campaignID', 'adGroupID', 'adID', 'day'],
'keyword_s__performanc_e__repor_t': ['campaignID', 'adGroupID', 'keywordID', 'day']
}
}
All the keys in the has have been converted to a weird (non correct) version of their snake case counterpart.
So it is clear that this is not something happening in the Meltano Core, but somewhere in Meltano UI we make this conversion.
As this is a hidden configuration parameter with a predefined value (the hash of arrays), when the Tap is run through the CLI, it fetches the definition from discovery.yml
and directly uses it for generating the final configuration.
My bet is that when an Extractor is added and its configuration is saved, something a little bit different happens that also converts the keys:
- Each configuration parameter is fetched from
discovery.yml
- Some magic happens here --> where the wrong conversion happens 99.999%
- The values are stored in our internal storage
- When the Tap (or meltano config) runs, those values are used
Additional Investigation of the issue and way to reproduce it
Note here that everything else works as expected: the hash and the arrays are properly converted to their JSON counterparts. So this does not seem as an issue with only expecting primitive values in some part of our code.
I was pretty sure that this happens to ALL hash keys no matter how deep they are, so I run the following experiment:
- I initialized a new Meltano Project
- I copy pasted the
discovery.yml
from our master - I updated
tap-carbon-intensity
with as follows:
extractors:
- name: tap-carbon-intensity
label: Carbon Emissions Intensity
description: National Grid ESO's Carbon Emissions Intensity API
namespace: tap_carbon
docs: 'https://meltano.com/plugins/extractors/carbon-intensity.html'
pip_url: 'git+https://gitlab.com/meltano/tap-carbon-intensity'
capabilities:
- discover
settings:
- name: hashes_fail
label: Hash Fail
value:
yannis:
- BIG_THING_HERE
- AD_PERFORMANCE_REPORT
- keywordID
- key1: 111
key2: 328976
BIG_KEY_3: 8387387
CAPITAL_KEY_YAY:
- campaignID
- adGroupID
- keywordID
- day
kind: hidden
description: Hash Keys fail
So basically I added one hidden configuration parameter that is a hash of arrays, but with an additional hash inside one of the two arrays (the one with BIG_KEY_3
)
I added tap-carbon-intensity
and then checked what happened:
{
'hashes_fail':
{
'capita_l__ke_y__ya_y': ['campaignID', 'adGroupID', 'keywordID', 'day'],
'yannis':
[
'BIG_THING_HERE',
'AD_PERFORMANCE_REPORT',
'keywordID',
{
'bi_g__ke_y_2': 8387387,
'key1': 111,
'key2': 328976
}
]
}
}
And there it is! Whatever causes this issue, recursively visits all entries and converts all hash keys to this weird snake case format:
-
key inside the hash
CAPITAL_KEY_YAY
--> 'capita_l__ke_y__ya_y' -
Key inside a hash that is inside an array inside another hash
CAPITAL_KEY_YAY
--> 'capita_l__ke_y__ya_y'
Everything else is left untouched (e.g. BIG_THING_HERE
or AD_PERFORMANCE_REPORT
), so this only happens to keys.
Once more, the generated JSON correctly follows the YAML definition of a config value that is a hash with arrays, with an additional hash somewhere deep down, so this seems to be treated without issues.