Skip to content

Moving BigQuery data table into GitLab Grafana

Dylan Bernardi requested to merge bigquery-to-grafana into main

This MR will close this issue: https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/prompt-library/-/issues/15+.

The MR accomplishes/includes the following:

  • a python script called writeBQDatasetToGCS.py that flattens the BQ table into GCS .csv format which is easily handled in ClickHouse
  • SQL scripts to create the prompt_library database and the score_chunks_v2 data table using the S3 engine
  • added some make targets to connect to the ClickHouse instance and create queries in an easy fashion as well as the scripts that go with those make targets
  • ensures that the table names in ClickHouse match the bigquery table

Some things to consider in the future or maybe even to revise in this MR:

  • As a result of using ClickHouse for Suggested Reviewer, it came to a point when there was too much data to be handled by S3 engines and the switch was proposed to use GCS backed tables. This may need to be implemented here as the tables grow in size.

cc @mray2020 @HongtaoYang @tle_gitlab @bcardoso- @srayner

Edited by Dylan Bernardi

Merge request reports