Skip to content

ClickHouse data collector

charlie ablett requested to merge 414610-cablett-clickhouse-data-collector into master

What does this MR do and why?

ClickHouse data collector

Introduces feature flag clickhouse_data_collector #420257

Screenshots or screen recordings

No user-facing changes

How to set up and validate locally

Make sure you have GitLab Ultimate license on your local.

  1. Install and run the ClickHouse server as per these directions 👉 !124295 (merged)
  • If you do the one-line install rather than installing via apt, you can remove the password from the yml file because the default user it comes with doesn't have a password. There is a way to set up other users or set a password, but for the purposes of testing I don't think it's necessary.
  1. Run the ClickHouse client, connecting to the gitlab_clickhouse_test database.
  2. Create the events table into the gitlab_clickhouse_test database (run the code as in the db/click_house/main/20230705124511_create_events.sql file in clickhouse client)
  3. Create the contribution_analytics_events table - run db/click_house/main/20230724064832_create_contribution_analytics_events.sql file in this MR in clickhouse client
  4. Create the materialised view - run db/click_house/main/20230724064918_contribution_analytics_events_materialized_view.sql in this MR in clickhouse client

OR as @ahegyi says: just do #414938 (closed) 😅

  1. Find a project with some MR activity (code pushes, etc) or seed some.
  2. Fill out the table:
def format_row(event)
  namespace = event.project.try(:project_namespace) || event.group
  path = namespace.traversal_ids.join('/')

  action = Event.actions[event.action]
  [
    event.id,
    "'#{path}/'",
    event.author_id,
    event.target_id,
    "'#{event.target_type}'",
    action,
    event.created_at.to_f,
    event.updated_at.to_f
  ].join(',')
end


values = []
Event.all.each do |event|
  values << "(#{format_row(event)})"
end

insert_query = <<~SQL
INSERT INTO events
(id, path, author_id, target_id, target_type, action, created_at, updated_at)
VALUES
#{values.join(',')}
SQL

ClickHouse::Client.execute(insert_query, :main)
  1. Tail logs tail -f log/development.log
  2. To get the data:
  • PG: make sure Feature.enabled?(:clickhouse_data_collection) is false
  • CH: make sure Feature.enabled?(:clickhouse_data_collection) is true

Using the GDK group flightjs:

{
  group(fullPath: "flightjs") {
    contributions(from: "2023-06-01", to: "2023-08-01") {
      nodes {
        user {
          id
        }
        totalEvents
        repoPushed
      }
    }
  }
}

Try with the FF off first, note the result, and then with FF on. The resulting data should be the same. If the FF is off, you'll see a MATERIALIZED VIEW query hit the Postgres DB. If the FF is on, it will be missing.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #414610 (closed)

Merge request reports