[CA PoC] [optional] Load PG `events` table to CH

Why are we doing this work

Optional

This issue might be optional since we can achieve similar results with the #414937 (closed). We can skip this issue if we're ok having data only from the last few weeks.


We need to do a one-time sync from the PostgreSQL database to have historical data for the Contribution Analytics feature.

Implementation plan

Depending on the database schema there should be a one-time export from the PostgreSQL database to CSV and import to the ClickHouse database.

How:

  1. Reach out to the infrastructure team and ask for guidance.
  2. Prepare a SQL query that contains all data we need to be present in ClickHouse.
  3. If the exported data cannot be imported into the ClickHouse DB table (events) without some data transformation, load the raw data into a temporary table.
  4. Fill up the events database table on ClickHouse from the temporary table using SELECT + INSERT with the necessary data transformations.

SQL query:

\copy (
SELECT
  events.id AS id,
  (
    CASE
      WHEN project_id IS NOT NULL THEN (SELECT array_to_string(traversal_ids, '/') || '/' FROM namespaces WHERE id = (SELECT project_namespace_id FROM projects WHERE id = events.project_id LIMIT 1) LIMIT 1)
      WHEN group_id IS NOT NULL THEN (SELECT array_to_string(traversal_ids, '/') || '/' FROM namespaces WHERE id = events.group_id)
      ELSE ''
    END      
  ) AS path,
  events.author_id AS author_id,
  events.target_id AS target_id,
  events.target_type AS target_type,
  events.action AS action,
  EXTRACT(epoch FROM created_at) AS created_at,
  EXTRACT(epoch FROM updated_at) AS updated_at
FROM events
ORDER BY id
) TO events_with_path.csv WITH DELIMITER ',' CSV HEADER;

The query creates a csv file with the exact structure that we need for the ClickHouse schema. Once the ClickHouse database is ready (events table is present), we can import the CSV:

clickhouse-client -d gitlab_clickhouse_production --password -q 'INSERT INTO events FORMAT CSVWithNames' < events_with_path.csv
Edited by Adam Hegyi