Automate Version DB Exports
1. Problem
After the spike discussions "Spike: Automate Version DB Exports to S3" https://gitlab.com/gitlab-org/telemetry/-/issues/363 the plan is to design the automated jobs.
2. Solution
2.a Approach
A Gitlab scheduled pipeline job will use the Google Cloud SDK command line tools to export the a CSV data using a sanitized SQL statement.
2.b Details
- When you use Cloud SQL to perform an export, whether from the Cloud Console, the gcloud command-line tool, or the API, you are using the pg_dump utility, with the options required to ensure that the resulting export file is valid for import back into Cloud SQL.
gsutil mb -p [PROJECT_NAME] -l [LOCATION_NAME] gs://[BUCKET_NAME]
gcloud sql instances describe [INSTANCE_NAME]
gcloud sql export csv [INSTANCE_NAME]
gs://[BUCKET_NAME]/[FILE_NAME]
--database=[DATABASE_NAME]
--query=[SELECT_QUERY]
# or non standard CSV
\copy [table_name] TO '[csv_file_name].csv' WITH
(FORMAT csv, ESCAPE '[escape_character]', QUOTE '[quote_character]',
DELIMITER '[delimiter_character]', ENCODING 'UTF8', NULL '[null_marker_string]');
- gcloud sql export csv doc https://cloud.google.com/sdk/gcloud/reference/sql/export/csv
- Read https://cloud.google.com/sql/docs/postgres/import-export/exporting
- Example s
gcloud sql export csv instance_name gs://bucket_name/ --query="SELECT 'columnname1' , 'columnname2', 'columnname3' UNION SELECT columnname1, columnname2, columnname3 FROM table_name" --database=database_name
2.c Best practices
See Best practices for importing and exporting data https://cloud.google.com/sql/docs/mysql/import-export
- Use CSV exports to export only what you need.
- Compress data to reduce cost
- Use smaller batches of data to export, to avoid long running tasks, which cannot be stopped
- The connection to Cloud Storage may be timing out because the query running in the export is not producing any data within the first seven minutes since the export is initiated. Try COPY (INSERT_YOUR_QUERY_HERE) TO STDOUT WITH ( FORMAT csv, DELIMITER ',', ENCODING 'UTF8', QUOTE '"', ESCAPE '"' ).
2.d Credentials
To export data to Cloud Storage, the instance's service account must have the storage.objectAdmin Cloud IAM role set in the project. For more information, see Cloud Identity and Access Management for Cloud Storage.
3 Links
- Installing the Cloud SDK Docker image https://cloud.google.com/sdk/docs/downloads-docker
- Pipeline schedules https://gitlab.com/help/ci/pipelines/schedules
- Getting started with GitLab CI/CD https://docs.gitlab.com/ee/ci/quick_start/README.html and https://docs.gitlab.com/ee/ci/
- GitLab CI/CD Pipeline Configuration Reference https://docs.gitlab.com/ee/ci/yaml/README.html#script
- GitLab CI/CD Examples https://docs.gitlab.com/ee/ci/examples/README.html
Edited by Alper Akgun