Moving BigQuery data table into GitLab Grafana
This MR will close this issue: https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/prompt-library/-/issues/15+.
The MR accomplishes/includes the following:
- a python script called
writeBQDatasetToGCS.py
that flattens the BQ table into GCS .csv format which is easily handled in ClickHouse - SQL scripts to create the
prompt_library
database and thescore_chunks_v2
data table using the S3 engine - added some make targets to connect to the ClickHouse instance and create queries in an easy fashion as well as the scripts that go with those make targets
- ensures that the table names in ClickHouse match the bigquery table
Some things to consider in the future or maybe even to revise in this MR:
- As a result of using ClickHouse for Suggested Reviewer, it came to a point when there was too much data to be handled by S3 engines and the switch was proposed to use GCS backed tables. This may need to be implemented here as the tables grow in size.
Edited by Dylan Bernardi