ClickHouse data collector
What does this MR do and why?
ClickHouse data collector
Introduces feature flag clickhouse_data_collector
#420257 (closed)
Screenshots or screen recordings
No user-facing changes
How to set up and validate locally
Make sure you have GitLab Ultimate license on your local.
- Install and run the ClickHouse server as per these directions
!124295 (merged)
- If you do the one-line install rather than installing via
apt
, you can remove the password from theyml
file because the default user it comes with doesn't have a password. There is a way to set up other users or set a password, but for the purposes of testing I don't think it's necessary.
- Run the ClickHouse client, connecting to the
gitlab_clickhouse_test
database. - Create the
events
table into thegitlab_clickhouse_test
database (run the code as in thedb/click_house/main/20230705124511_create_events.sql
file in clickhouse client) - Create the
contribution_analytics_events
table - rundb/click_house/main/20230724064832_create_contribution_analytics_events.sql
file in this MR in clickhouse client - Create the materialised view - run
db/click_house/main/20230724064918_contribution_analytics_events_materialized_view.sql
in this MR in clickhouse client
OR as @ahegyi says: just do #414938 (closed)
- Find a project with some MR activity (code pushes, etc) or seed some.
- Fill out the table:
def format_row(event)
namespace = event.project.try(:project_namespace) || event.group
path = namespace.traversal_ids.join('/')
action = Event.actions[event.action]
[
event.id,
"'#{path}/'",
event.author_id,
event.target_id,
"'#{event.target_type}'",
action,
event.created_at.to_f,
event.updated_at.to_f
].join(',')
end
values = []
Event.all.each do |event|
values << "(#{format_row(event)})"
end
insert_query = <<~SQL
INSERT INTO events
(id, path, author_id, target_id, target_type, action, created_at, updated_at)
VALUES
#{values.join(',')}
SQL
ClickHouse::Client.execute(insert_query, :main)
- Tail logs
tail -f log/development.log
- To get the data:
- PG: make sure
Feature.enabled?(:clickhouse_data_collection)
isfalse
- CH: make sure
Feature.enabled?(:clickhouse_data_collection)
istrue
Using the GDK group flightjs
:
{
group(fullPath: "flightjs") {
contributions(from: "2023-06-01", to: "2023-08-01") {
nodes {
user {
id
}
totalEvents
repoPushed
}
}
}
}
Try with the FF off first, note the result, and then with FF on. The resulting data should be the same. If the FF is off, you'll see a MATERIALIZED VIEW
query hit the Postgres DB. If the FF is on, it will be missing.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #414610 (closed)
Merge request reports
Activity
changed milestone to %16.3
added backend devopsplan groupoptimize priority2 sectiondev typefeature workflowin dev labels
assigned to @cablett
- A deleted user
added database databasereview pending labels
2 Warnings This merge request is quite big (584 lines changed), please consider splitting it into multiple merge requests. featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.
For more information, see:
- The Handbook page on merge request types.
- The definition of done documentation.
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Aakriti Gupta (
@aakriti.gupta
) (UTC+5.5, 6.5 hours behind@cablett
)Madelein van Niekerk (
@maddievn
) (UTC+2, 10 hours behind@cablett
)database Tianwen Chen (
@tianwenchen
) (UTC+8, 4 hours behind@cablett
)Jon Jenkins (
@jon_jenkins
) (UTC-5, 17 hours behind@cablett
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Dangeradded 2780 commits
-
5fba6db9...392b3bae - 2776 commits from branch
master
- c7654552 - Setup ClickHouse on CI
- 25f6fc43 - Setup ClickHouse on CI
- 374addf7 - Add basic test cases for ClickHouse
- 2fce0485 - ClickHouse data collector for contribution analytics
Toggle commit list-
5fba6db9...392b3bae - 2776 commits from branch
added 1 commit
- f1c3f366 - ClickHouse data collector for contribution analytics
Allure report
allure-report-publisher
generated test report!e2e-review-qa:
test report for 5fba6db9expand test summary
+------------------------------------------------------------+ | suites summary | +-------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +-------+--------+--------+---------+-------+-------+--------+ | Plan | 50 | 0 | 1 | 0 | 51 | ✅ | +-------+--------+--------+---------+-------+-------+--------+ | Total | 50 | 0 | 1 | 0 | 51 | ✅ | +-------+--------+--------+---------+-------+-------+--------+
e2e-test-on-gdk:
test report for 1987b06dexpand test summary
+------------------------------------------------------------------+ | suites summary | +-------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +-------------+--------+--------+---------+-------+-------+--------+ | Govern | 34 | 0 | 0 | 1 | 34 | ❗ | | Create | 38 | 0 | 0 | 4 | 38 | ❗ | | Plan | 51 | 0 | 0 | 0 | 51 | ✅ | | Data Stores | 20 | 0 | 0 | 2 | 20 | ❗ | | Manage | 12 | 0 | 1 | 6 | 13 | ❗ | | Verify | 8 | 0 | 0 | 0 | 8 | ✅ | +-------------+--------+--------+---------+-------+-------+--------+ | Total | 163 | 0 | 1 | 13 | 164 | ❗ | +-------------+--------+--------+---------+-------+-------+--------+
e2e-package-and-test:
test report for 1987b06dexpand test summary
+------------------------------------------------------------+ | suites summary | +-------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +-------+--------+--------+---------+-------+-------+--------+ | Plan | 155 | 1 | 6 | 0 | 162 | ❌ | +-------+--------+--------+---------+-------+-------+--------+ | Total | 155 | 1 | 6 | 0 | 162 | ❌ | +-------+--------+--------+---------+-------+-------+--------+
added 916 commits
-
f1c3f366...eb0f0465 - 912 commits from branch
master
- a8d7e2f1 - Setup ClickHouse on CI
- c2568f28 - Setup ClickHouse on CI
- 87bff275 - Add basic test cases for ClickHouse
- 1df9580b - ClickHouse data collector for contribution analytics
Toggle commit list-
f1c3f366...eb0f0465 - 912 commits from branch
@cablett Some end-to-end (E2E) tests have been selected based on the stage label on this MR.Please start the
trigger-omnibus-and-follow-up-e2e
job in theqa
stage and ensure the tests infollow-up-e2e:package-and-test-ee
pipeline are passing before this MR is merged. (The E2E test pipeline is computationally intensive and we cannot afford running it automatically for all pushes/rebases. Therefore, this job must be triggered manually after significant changes at least once.)If you would like to run all E2E tests, please apply the pipeline:run-all-e2e label and trigger a new pipeline. This will run all tests in
e2e:package-and-test
pipeline.The E2E test jobs are allowed to fail due to flakiness. For the list of known failures please refer to the latest pipeline triage issue.
Once done, please apply the
emoji on this comment.For any questions or help in reviewing the E2E test results, please reach out on the internal #quality Slack channel.
added 1 commit
- b21dbccc - ClickHouse data collector for contribution analytics
- A deleted user
added feature flag label
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@d57e7dfc
added 1 commit
- cdc57251 - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@824dd4f4
- Resolved by Adam Hegyi
- Resolved by charlie ablett
added 1 commit
- efc748d9 - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@9ac39736
added 1 commit
- db307ee1 - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@5c309be0
added 1 commit
- 52760dac - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@8d3b4316
added 1 commit
- f0ba8a97 - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@f405abf6
added 1 commit
- 3a117beb - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@90ee6103
added 1 commit
- 16d1becd - ClickHouse data collector for contribution analytics
- A deleted user
added frontend label
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@7668e03f
- Resolved by Jessie Young
- Resolved by charlie ablett
added 1 commit
- c924945d - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@3482e099
added 1 commit
- e8575adb - ClickHouse data collector for contribution analytics
added 1 commit
- 013cdbc7 - ClickHouse data collector for contribution analytics
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@61f1bbc3
- Resolved by charlie ablett
@ahegyi I've got most of the specs passing except for https://gitlab.com/gitlab-org/gitlab/-/jobs/4775005107#L432 which is somehow claiming the
main
DB is not configured even though the:click_house
annotation is attached.Anyway, would you please have a quick look to ensure this is going in the correct direction?
- Resolved by charlie ablett
- Resolved by charlie ablett
added 1 commit
- 9ec21de9 - ClickHouse data collector for contribution analytics
- Resolved by charlie ablett
mentioned in commit gitlab-org-sandbox/gitlab-jh-validation@33352581
- Resolved by charlie ablett
- Resolved by charlie ablett
- Resolved by charlie ablett
- Resolved by charlie ablett
Thanks, @cablett! Looks great! I left a few minor comments.
- Resolved by charlie ablett