examples/bq_file_load_benchmark/setup.py · master · matthieu gillot / professional-services

BQ File Load Benchmark Iteration 1 (#276) · 37b7168e
Anna Rudy authored Aug 22, 2019


* adding bq_file_load_benchmark example

* removing CREATE_IF_EMPTY create disposition from bq_table_resizer since it is no longer an opiton and CREATE_IF_NEEDED is now default

* uncommenting out file params

* fixing readme typo

* moving tests dir to root of example

* modifying abs paths

* fixing readme typo

* adding note about futurure params in the readme.

* taking link to results table out since project, dataset id, and table name should be enough to find it

* Update examples/bq_file_load_benchmark/README.md

Co-Authored-By: Jacob Ferriero <jferriero@google.com>

* adding back tiks and links

* modifying results schema to include totalSlotMs and avgSlots

* modifying description for avgSlots

* modifying schema modes to match that of existing results table

* adding logic for totalSlotMs and avgSlots

* adding BYTES_IN_GB global variable to benchmark_results_util.py

* removing extra rather in awkward comment

* removing commented output

* adding regex to clean up process of extracting file data from gcs path

* chaning "Gather" to "Discover" in bucket_util docstrings

* cleaning up bucket_util by using ThreadPoolExecutor and itertools

* removing nested for loops from staging_table_generator

* adding MB unit to field descriptions

* cleaning up nested for loops and using regex to get file info less awkwardly

* parallelizing staging table creation

* removing unneeded command_str

* removing magic numbers from staging_table_generator

* modifying for loop to list comp in parquet_util

* reducing amount of test data

* fixing instruction error in readme

* fixing file_generator comment errors

* making iterator -> list conversion less awkward

* replacing nested for loops in file_generator with itertools product

* changing column_type(s) to schema_type(s)

* adding test for file_generator

* fixing blob composition process

* fixing error in file_generator

* fixing logic in composition section of file_generator

* adding test for _compose_sharded_blobs in file_generator

* getting num files from job data instead of file path in case file generation stopped prematurely

* unpinning dependencies

* fixing/ issues with and cleaning setup.py/requirements.txt

* removing test data

* removing extra space from logging message;

* cutting down on staging_table_Generator test so it wont take so long

* adding back in an accidentally deleted schema file
37b7168e