Skip to content

Service for counting contributors in a group

Adam Hegyi requested to merge 432067-create-ch-contributions-table into master

What does this MR do and why?

This MR implements a service that gives the unique number of contributors (user-ids) within a given group and its subgroups. The feature uses the optional ClickHouse database. To implement the contributor count, we used the same query rules in the materialized view as the PG based contribution graph query.

Note: data will be populated "by hand" on PRD. I already prepared the table on STG.

Query:

SELECT count(distinct author_id) AS contributor_count
FROM (
  SELECT
    argMax(author_id, contributions.updated_at) AS author_id
  FROM contributions
    WHERE startsWith(path, '9970/')
    AND "contributions"."created_at" >= '2017-01-01'
    AND "contributions"."created_at" <= '2018-12-30'
  GROUP BY id
) contributions
| Expression ((Projection + Before ORDER BY))                                                                   |
|---------------------------------------------------------------------------------------------------------------|
| Aggregating                                                                                                   |
| Expression ((Before GROUP BY + (Projection + Before ORDER BY)))                                               |
| Aggregating                                                                                                   |
| Expression (Before GROUP BY)                                                                                  |
| ReadFromMergeTree (gitlab_clickhouse_main_staging.contributions)                                              |
| Indexes:                                                                                                      |
| MinMax                                                                                                        |
| Keys:                                                                                                         |
| created_at                                                                                                    |
| Condition: and((created_at in (-Inf, 17895]), (created_at in [17167, +Inf)))                                  |
| Parts: 7/28                                                                                                   |
| Granules: 2062/2382                                                                                           |
| Partition                                                                                                     |
| Keys:                                                                                                         |
| toYear(created_at)                                                                                            |
| Condition: and((toYear(created_at) in (-Inf, 2018]), (toYear(created_at) in [2017, +Inf)))                    |
| Parts: 7/7                                                                                                    |
| Granules: 2062/2062                                                                                           |
| PrimaryKey                                                                                                    |
| Keys:                                                                                                         |
| path                                                                                                          |
| created_at                                                                                                    |
| Condition: and((path in ['9970', '9971')), and((created_at in (-Inf, 17895]), (created_at in [17167, +Inf)))) |
| Parts: 5/7                                                                                                    |
| Granules: 8/2062                                                                                              |

How to validate locally

Enable FFs:

Feature.enable(:clickhouse_data_collection)
Feature.enable(:event_sync_worker_for_click_house)
  1. Ensure that you're on ultimate
  2. Ensure that CH is configured: https://docs.gitlab.com/ee/development/database/clickhouse/clickhouse_within_gitlab.html
  3. For prepping the DB schema you can invoke: bundle exec rake gitlab:clickhouse:migrate
  4. If your GDK is seeded, then you probably have some events records, you can sync them to CH: ClickHouse::EventsSyncWorker.new.perform
  5. The service should return a count
described_class.new(
  group: Group.find(1),
  current_user: User.find(2),
  from: Date.new(2020, 1, 1),
  to: Date.new(2023, 11, 29)
).execute

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #432067 (closed)

Merge request reports