Drop or update the column `stage` from `ci_finished_builds` in clickhouse
Since we are no longer using the stage column in ci_builds, we need to drop or update (to reference ci_stage) the stage column in the ci_finished_builds table in the clickhouse database:
CREATE TABLE gitlab_clickhouse_development.ci_finished_builds
(
`id` UInt64 DEFAULT 0,
`project_id` UInt64 DEFAULT 0,
`pipeline_id` UInt64 DEFAULT 0,
`status` LowCardinality(String) DEFAULT '',
`created_at` DateTime64(6, 'UTC') DEFAULT 0,
`queued_at` DateTime64(6, 'UTC') DEFAULT 0,
`finished_at` DateTime64(6, 'UTC') DEFAULT 0,
`started_at` DateTime64(6, 'UTC') DEFAULT 0,
`runner_id` UInt64 DEFAULT 0,
`runner_manager_system_xid` String DEFAULT '',
`runner_run_untagged` Bool DEFAULT false,
`runner_type` UInt8 DEFAULT 0,
`runner_manager_version` LowCardinality(String) DEFAULT '',
`runner_manager_revision` LowCardinality(String) DEFAULT '',
`runner_manager_platform` LowCardinality(String) DEFAULT '',
`runner_manager_architecture` LowCardinality(String) DEFAULT '',
`duration` Int64 MATERIALIZED if((started_at > 0) AND (finished_at > started_at), age('ms', started_at, finished_at), 0),
`queueing_duration` Int64 MATERIALIZED if((queued_at > 0) AND (started_at > queued_at), age('ms', queued_at, started_at), 0),
`root_namespace_id` UInt64 DEFAULT 0,
`name` String DEFAULT '',
`stage` String DEFAULT '',
`date` Date32 MATERIALIZED toStartOfMonth(finished_at),
`runner_owner_namespace_id` UInt64 DEFAULT 0
)
ENGINE = ReplacingMergeTree
PARTITION BY toYear(finished_at)
ORDER BY (status, runner_type, project_id, finished_at, id)
SETTINGS index_granularity = 8192, use_async_block_ids_cache = true
Context here: https://gitlab.slack.com/archives/C04D13QHUH1/p1733334254413389 (internal)
Implementation plan
Currently, our export process from Postgres to ClickHouse is exporting empty stage values. We could update the export code to fetch the p_ci_stages.name instead, but it feels unnecessary to have a denormalized text column when we now have this information in a normalized for in p_ci_stages. So we should:
- Drop the
ci_finished_builds.stagecolumn with a ClickHouse migration. - Add
ci_finished_builds.stage_idcolumn as a ClickHouse migration (defaulting to 0). - Change the export code to include the
stage_idin the CSV.
Support contact: @pedropombeiro
Edited by Pedro Pombeiro