Skip to content
  • Anna Rudy's avatar
    BQ File Load Benchmark Iteration 1 (#276) · 37b7168e
    Anna Rudy authored
    
    
    * adding bq_file_load_benchmark example
    
    * removing CREATE_IF_EMPTY create disposition from bq_table_resizer since it is no longer an opiton and CREATE_IF_NEEDED is now default
    
    * uncommenting out file params
    
    * fixing readme typo
    
    * moving tests dir to root of example
    
    * modifying abs paths
    
    * fixing readme typo
    
    * adding note about futurure params in the readme.
    
    * taking link to results table out since project, dataset id, and table name should be enough to find it
    
    * Update examples/bq_file_load_benchmark/README.md
    
    Co-Authored-By: default avatarJacob Ferriero <jferriero@google.com>
    
    * adding back tiks and links
    
    * modifying results schema to include totalSlotMs and avgSlots
    
    * modifying description for avgSlots
    
    * modifying schema modes to match that of existing results table
    
    * adding logic for totalSlotMs and avgSlots
    
    * adding BYTES_IN_GB global variable to benchmark_results_util.py
    
    * removing extra rather in awkward comment
    
    * removing commented output
    
    * adding regex to clean up process of extracting file data from gcs path
    
    * chaning "Gather" to "Discover" in bucket_util docstrings
    
    * cleaning up bucket_util by using ThreadPoolExecutor and itertools
    
    * removing nested for loops from staging_table_generator
    
    * adding MB unit to field descriptions
    
    * cleaning up nested for loops and using regex to get file info less awkwardly
    
    * parallelizing staging table creation
    
    * removing unneeded command_str
    
    * removing magic numbers from staging_table_generator
    
    * modifying for loop to list comp in parquet_util
    
    * reducing amount of test data
    
    * fixing instruction error in readme
    
    * fixing file_generator comment errors
    
    * making iterator -> list conversion less awkward
    
    * replacing nested for loops in file_generator with itertools product
    
    * changing column_type(s) to schema_type(s)
    
    * adding test for file_generator
    
    * fixing blob composition process
    
    * fixing error in file_generator
    
    * fixing logic in composition section of file_generator
    
    * adding test for _compose_sharded_blobs in file_generator
    
    * getting num files from job data instead of file path in case file generation stopped prematurely
    
    * unpinning dependencies
    
    * fixing/ issues with and cleaning setup.py/requirements.txt
    
    * removing test data
    
    * removing extra space from logging message;
    
    * cutting down on staging_table_Generator test so it wont take so long
    
    * adding back in an accidentally deleted schema file
    37b7168e