Tap BigQuery Float Issue
What is the current bug behavior?
When running the tap bigquery - i.e. extractor (Source); it fails to run the job.
What is the expected correct behavior?
It should run the query and put the data into loader.
Steps to reproduce
- Export the BQ credentials path.
export TAP_BIGQUERY_CREDENTIALS_PATH=/my_folder/client_secrets.json
- Add the below config into meltano.yml
variant: anelendata
config:
streams:
- name: DSSBQ
table: "`meltanoetl25122020.GoogleManagerDataset.AccountBasicStats_486x`"
columns: ['*']
datetime_key: Date
start_datetime: '2020-10-01T00:00:00Z'
end_datetime: "2021-06-10T00:00:00Z",
start_always_inclusive: false
- Setup the load - for eg - csv
- Run the job
meltano elt tap-bigquery target-csv --job_id=bigquery-to-csv
Relevant logs and/or screenshots
meltano | Running extract & load...
meltano | No state was found, complete import.
meltano | ELT could not be completed: Cannot start extractor: Catalog discovery failed: command ['/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/bin/tap-bigquery', '--config', '/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/run/elt/bigquery-to-csv/023a70bf-9045-4cb9-b219-73ff8557e9d0/tap.config.json', '--discover'] returned 1: INFO Running query:
meltano | SELECT * FROM `meltanoetl25122020.GoogleManagerDataset.AccountBasicStats_486x` WHERE 1=1 AND datetime '2020-10-01 00:00:00.000000' <= CAST(Date as datetime) AND CAST(Date as datetime) < datetime '2021-06-11 00:00:00.000000' ORDER BY Date LIMIT 100
meltano | CRITICAL float() argument must be a string or a number, not 'datetime.date'
meltano | Traceback (most recent call last):
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/bin/tap-bigquery", line 8, in <module>
meltano | sys.exit(main())
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/singer/utils.py", line 229, in wrapped
meltano | return fnc(*args, **kwargs)
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/tap_bigquery/__init__.py", line 175, in main
meltano | catalog = discover(CONFIG)
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/tap_bigquery/__init__.py", line 45, in discover
meltano | stream_metadata, stream_key_properties, schema = source.do_discover(
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/tap_bigquery/sync_bigquery.py", line 101, in do_discover
meltano | schema = getschema.infer_schema(data)
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/getschema/impl.py", line 212, in infer_schema
meltano | cur_schema = _do_infer_schema(
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/getschema/impl.py", line 47, in _do_infer_schema
meltano | ret = _do_infer_schema(obj[key])
meltano | File "/Users/username/Desktop/Development/Other/meltano-projects/my_company-dw/.meltano/extractors/tap-bigquery/venv/lib/python3.9/site-packages/getschema/impl.py", line 66, in _do_infer_schema
meltano | float(obj)
meltano | TypeError: float() argument must be a string or a number, not 'datetime.date'
Possible fixes
The cast could be the issue - it was also mentioned here - https://github.com/anelendata/tap-bigquery/issues/17
I tried to run the same query on BQ console - and that worked fine - on the console.
SELECT * FROM `meltanoetl25122020.GoogleManagerDataset.AccountBasicStats_486x` WHERE 1=1 AND datetime '2020-10-01 00:00:00.000000' <= CAST(_DATA_DATE as datetime) AND CAST(_DATA_DATE as datetime) < datetime '2021-06-11 00:00:00.000000' ORDER BY _DATA_DATE LIMIT 100;
Further regression test
Ensure we automatically catch similar issues in the future
-
Write additional adequate test cases and submit test results -
Test results should be reviewed by a person from the team