[CA PoC] Revisit the database table schema and query
Why are we doing this work
To efficiently query the ClickHouse database, we need a database schema that supports the Contribution Analytics-related database queries (as of now, it seems like we'll need one query).
There is already a basic database schema used when running tests (#414588 (closed)). The task here is to finalize (experiment and measure) the CH-specifc DB schema (events
) and DB query.
A rudimentary version of the DB query is available within the CH experiment documentation: https://docs.gitlab.com/ee/development/database/clickhouse/gitlab_activity_data.html#clickhouse-database-queries
What to focus on:
Contribution Analytics works on the group level including subgroups. The table schema should make it easy to query the group hierarchy. Having the project_id
column is probably not enough since then we would have to load thousands of project_ids
from PostgreSQL
and pipe it into the ClickHouse query (project_id IN (1,2,3, ...)
.
Ideas:
- Add
traversal_ids
orfull_path
to the CH table. - ???
As a result, we should have a ready-to-use database schema and DB query that we can use for Contribution Analytics.