ClickHouse data collector
What does this MR do and why?
ClickHouse data collector
Introduces feature flag clickhouse_data_collector
#420257 (closed)
Screenshots or screen recordings
No user-facing changes
How to set up and validate locally
Make sure you have GitLab Ultimate license on your local.
- Install and run the ClickHouse server as per these directions
👉 !124295 (merged)
- If you do the one-line install rather than installing via
apt
, you can remove the password from theyml
file because the default user it comes with doesn't have a password. There is a way to set up other users or set a password, but for the purposes of testing I don't think it's necessary.
- Run the ClickHouse client, connecting to the
gitlab_clickhouse_test
database. - Create the
events
table into thegitlab_clickhouse_test
database (run the code as in thedb/click_house/main/20230705124511_create_events.sql
file in clickhouse client) - Create the
contribution_analytics_events
table - rundb/click_house/main/20230724064832_create_contribution_analytics_events.sql
file in this MR in clickhouse client - Create the materialised view - run
db/click_house/main/20230724064918_contribution_analytics_events_materialized_view.sql
in this MR in clickhouse client
OR as @ahegyi says: just do #414938 (closed)
- Find a project with some MR activity (code pushes, etc) or seed some.
- Fill out the table:
def format_row(event)
namespace = event.project.try(:project_namespace) || event.group
path = namespace.traversal_ids.join('/')
action = Event.actions[event.action]
[
event.id,
"'#{path}/'",
event.author_id,
event.target_id,
"'#{event.target_type}'",
action,
event.created_at.to_f,
event.updated_at.to_f
].join(',')
end
values = []
Event.all.each do |event|
values << "(#{format_row(event)})"
end
insert_query = <<~SQL
INSERT INTO events
(id, path, author_id, target_id, target_type, action, created_at, updated_at)
VALUES
#{values.join(',')}
SQL
ClickHouse::Client.execute(insert_query, :main)
- Tail logs
tail -f log/development.log
- To get the data:
- PG: make sure
Feature.enabled?(:clickhouse_data_collection)
isfalse
- CH: make sure
Feature.enabled?(:clickhouse_data_collection)
istrue
Using the GDK group flightjs
:
{
group(fullPath: "flightjs") {
contributions(from: "2023-06-01", to: "2023-08-01") {
nodes {
user {
id
}
totalEvents
repoPushed
}
}
}
}
Try with the FF off first, note the result, and then with FF on. The resulting data should be the same. If the FF is off, you'll see a MATERIALIZED VIEW
query hit the Postgres DB. If the FF is on, it will be missing.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #414610 (closed)
Edited by Pedro Pombeiro