Drop or update the column `stage` from `ci_finished_builds` in clickhouse

Since we are no longer using the stage column in ci_builds, we need to drop or update (to reference ci_stage) the stage column in the ci_finished_builds table in the clickhouse database:

CREATE TABLE gitlab_clickhouse_development.ci_finished_builds
(
    `id` UInt64 DEFAULT 0,
    `project_id` UInt64 DEFAULT 0,
    `pipeline_id` UInt64 DEFAULT 0,
    `status` LowCardinality(String) DEFAULT '',
    `created_at` DateTime64(6, 'UTC') DEFAULT 0,
    `queued_at` DateTime64(6, 'UTC') DEFAULT 0,
    `finished_at` DateTime64(6, 'UTC') DEFAULT 0,
    `started_at` DateTime64(6, 'UTC') DEFAULT 0,
    `runner_id` UInt64 DEFAULT 0,
    `runner_manager_system_xid` String DEFAULT '',
    `runner_run_untagged` Bool DEFAULT false,
    `runner_type` UInt8 DEFAULT 0,
    `runner_manager_version` LowCardinality(String) DEFAULT '',
    `runner_manager_revision` LowCardinality(String) DEFAULT '',
    `runner_manager_platform` LowCardinality(String) DEFAULT '',
    `runner_manager_architecture` LowCardinality(String) DEFAULT '',
    `duration` Int64 MATERIALIZED if((started_at > 0) AND (finished_at > started_at), age('ms', started_at, finished_at), 0),
    `queueing_duration` Int64 MATERIALIZED if((queued_at > 0) AND (started_at > queued_at), age('ms', queued_at, started_at), 0),
    `root_namespace_id` UInt64 DEFAULT 0,
    `name` String DEFAULT '',
    `stage` String DEFAULT '',
    `date` Date32 MATERIALIZED toStartOfMonth(finished_at),
    `runner_owner_namespace_id` UInt64 DEFAULT 0
)
ENGINE = ReplacingMergeTree
PARTITION BY toYear(finished_at)
ORDER BY (status, runner_type, project_id, finished_at, id)
SETTINGS index_granularity = 8192, use_async_block_ids_cache = true

Context here: https://gitlab.slack.com/archives/C04D13QHUH1/p1733334254413389 (internal)

Implementation plan

Currently, our export process from Postgres to ClickHouse is exporting empty stage values. We could update the export code to fetch the p_ci_stages.name instead, but it feels unnecessary to have a denormalized text column when we now have this information in a normalized for in p_ci_stages. So we should:

  1. Drop the ci_finished_builds.stage column with a ClickHouse migration.
  2. Add ci_finished_builds.stage_id column as a ClickHouse migration (defaulting to 0).
  3. Change the export code to include the stage_id in the CSV.

Support contact: @pedropombeiro

Edited by Pedro Pombeiro