Add experimental ClickHouse HTTP client
What does this MR do and why?
This MR adds a lightweight gem for interacting with ClickHouse databases via HTTP interface. We're following the gem guidelines here: https://docs.gitlab.com/ee/development/gems.html
The client was created so groupoptimize can move forward with the PoC where we plan to rebuild a feature using ClickHouse.
The client should support executing an aggregation query and return counts. At a later step it should also support bulk inserts. Models and validations are out of the scope.
Why not use an already existing gem?
The original idea was to add an external gem (#414386 (closed)) however, the gem does much more than we need for this PoC and adding an external gem would require careful review of the codebase. This would eventually mean:
- Review the code and assess its quality.
- Forking the gem and making sure we have a running pipeline for the supported ruby versions.
How does it work?
The gem supports multiple databases which can be specified via a YAML file. The Gitlab::ClickHouse::Client::Configuration
object contains the databases and other options so we can use our utilities (Gitlab::HTTP
, Gitlab::Json
).
Without the YAML file the gem will not do anything, configuration is optional.
Query execution flow:
- Invoke the
execute
method using the configured database and by passing a raw SQL string. - The
Client
invokes thehttp_post_proc
callable object which invokes a HTTP request. - The response will be turned into a
Response
object which can tell whether the response is successful or not. - The body of the response contains a raw JSON string, this will be parsed with the configured
json_parser
. - The JSON response includes the returned columns with their respective data types. Using this data, the
Formatter
casts the values if necessary. - The raw data (array of hashes) are returned.
How to test it
- Set up a ClickHouse server and a user with password: https://clickhouse.com/docs/en/install
- Start a console:
clickhouse-client --password
- Create a new database:
create database gitlab_clickhouse_test;
- Connect to the new database:
clickhouse-client -d gitlab_clickhouse_test --password
- Create a new table:
CREATE TABLE issues (
`id` UInt64,
`title` String DEFAULT '',
`description` Nullable(String),
`created_at` DateTime64(6, 'UTC') DEFAULT now(),
`updated_at` DateTime64(6, 'UTC') DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY (id)
- Insert some data:
insert into issues (id, title, description) values (1, 'Title 1', 'description');
insert into issues (id, title, description) values (2, 'Title 2', 'description');
insert into issues (id, title, description) values (3, 'Title 3', null);
- Create a config file in your gitlab dir:
config/click_house.yml
development:
main:
database: gitlab_clickhouse_test
url: 'http://localhost:8123'
username: default
password: clickhouse
- Start rails console and execute:
Gitlab::ClickHouse::Client.execute("select * from issues", :main)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Merge request reports
Activity
assigned to @ahegyi
- A deleted user
added backend label
2 Warnings This merge request is quite big (665 lines changed), please consider splitting it into multiple merge requests. featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.
For more information, see:
- The Handbook page on merge request types.
- The definition of done documentation.
1 Message CHANGELOG missing: If you want to create a changelog entry for GitLab FOSS, add the
Changelog
trailer to the commit message you want to add to the changelog.If you want to create a changelog entry for GitLab EE, also add the
EE: true
trailer to your commit message.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
Pipeline Changes
This merge request contains changes to the pipeline configuration for the GitLab project.
Please consider the effect of the changes in this merge request on the following:
- Effects on different pipeline types
- Effects on non-canonical projects:
gitlab-foss
security
dev
- personal forks
- Effects on pipeline performance
Please consider communicating these changes to the broader team following the communication guideline for pipeline changes
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Javiera Tapia (
@jtapiab
) (UTC-4, 6 hours behind@ahegyi
)Mehmet Emin Inac (
@minac
) (UTC+2, same timezone as@ahegyi
)maintenanceworkflow / maintenancepipelines for CI, Danger Aboobacker MK (
@tachyons-gitlab
) (UTC+5.5, 3.5 hours ahead of@ahegyi
)David Dieulivol (
@ddieulivol
) (UTC+2, same timezone as@ahegyi
)Engineering Productivity Reviewer review is optional for Engineering Productivity David Dieulivol (
@ddieulivol
) (UTC+2, same timezone as@ahegyi
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
Rubygems
This merge request adds, or changes a Rubygems dependency. Please review the Gemfile guidelines.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Danger- Resolved by Adam Hegyi
I'm going through this guide about gem development: https://docs.gitlab.com/ee/development/gems.html and I ran into an issue which I'm unsure how to fix.
I'm trying to run rubocop in the gem folder:
cd gems/gitlab-click-house-client rubocop
And I'm getting the following error:
uninitialized constant RuboCop::RSpec::FactoryBot include RuboCop::RSpec::FactoryBot::Language ^^^^^^^^^^^^ gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:10:in `<class:StrategyInCallback>' gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:9:in `<module:FactoryBot>' gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:8:in `<module:RSpec>' gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:7:in `<module:Cop>' gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:6:in `<module:RuboCop>' gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:5:in `<top (required)>'
My gem doesn't use factory bot at all so I wonder if we could run it off somehow. I tried disabling
RSpec/FactoryBot
but it didn't help.Have you ran into a similar issue when you worked on the
gitlab-rspec
gem?
- Resolved by 🤖 GitLab Bot 🤖
Proper labels assigned to this merge request. Please ignore me.
@ahegyi - please see the following guidance and update this merge request.1 Error Please add typebug typefeature, or typemaintenance label to this merge request. Edited by 🤖 GitLab Bot 🤖
- Resolved by Adam Hegyi
- Resolved by Adam Hegyi
- Resolved by Adam Hegyi
@ahegyi This is beautiful to see :) Thank you!
Allure report
allure-report-publisher
generated test report!e2e-test-on-gdk:
test report for e29d3227expand test summary
+-----------------------------------------------------------------------+ | suites summary | +------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------+--------+--------+---------+-------+-------+--------+ | Govern | 2 | 0 | 0 | 0 | 2 | ✅ | | Create | 8 | 0 | 1 | 5 | 9 | ❗ | | Monitor | 4 | 0 | 0 | 4 | 4 | ❗ | | Plan | 4 | 0 | 0 | 4 | 4 | ❗ | | Data Stores | 2 | 0 | 0 | 0 | 2 | ✅ | | Manage | 1 | 0 | 0 | 0 | 1 | ✅ | | Framework sanity | 0 | 0 | 1 | 0 | 1 | ➖ | +------------------+--------+--------+---------+-------+-------+--------+ | Total | 21 | 0 | 2 | 13 | 23 | ❗ | +------------------+--------+--------+---------+-------+-------+--------+
e2e-review-qa:
test report for e29d3227expand test summary
+-----------------------------------------------------------------------+ | suites summary | +------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------+--------+--------+---------+-------+-------+--------+ | Create | 8 | 0 | 1 | 1 | 9 | ❗ | | Monitor | 4 | 0 | 0 | 0 | 4 | ✅ | | Data Stores | 2 | 0 | 0 | 0 | 2 | ✅ | | Framework sanity | 0 | 0 | 1 | 0 | 1 | ➖ | | Plan | 3 | 0 | 1 | 0 | 4 | ✅ | | Manage | 1 | 0 | 0 | 0 | 1 | ✅ | | Govern | 2 | 0 | 0 | 0 | 2 | ✅ | +------------------+--------+--------+---------+-------+-------+--------+ | Total | 20 | 0 | 3 | 1 | 23 | ❗ | +------------------+--------+--------+---------+-------+-------+--------+
e2e-package-and-test:
test report for e29d3227expand test summary
+-----------------------------------------------------------------------+ | suites summary | +------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +------------------+--------+--------+---------+-------+-------+--------+ | Data Stores | 117 | 0 | 3 | 0 | 120 | ✅ | | Manage | 142 | 1 | 27 | 5 | 170 | ❌ | | Create | 559 | 0 | 85 | 8 | 644 | ❗ | | Govern | 143 | 6 | 21 | 6 | 170 | ❌ | | Plan | 243 | 0 | 1 | 0 | 244 | ✅ | | Configure | 1 | 0 | 9 | 0 | 10 | ✅ | | Package | 232 | 0 | 9 | 0 | 241 | ✅ | | Verify | 147 | 0 | 15 | 3 | 162 | ❗ | | Monitor | 36 | 0 | 7 | 0 | 43 | ✅ | | Release | 18 | 0 | 0 | 0 | 18 | ✅ | | Secure | 6 | 0 | 27 | 0 | 33 | ✅ | | Fulfillment | 8 | 0 | 75 | 0 | 83 | ✅ | | Analytics | 7 | 0 | 0 | 0 | 7 | ✅ | | Systems | 11 | 0 | 0 | 0 | 11 | ✅ | | Growth | 0 | 0 | 6 | 0 | 6 | ➖ | | Framework sanity | 0 | 0 | 5 | 0 | 5 | ➖ | | GitLab Metrics | 2 | 0 | 1 | 0 | 3 | ✅ | | ModelOps | 0 | 0 | 3 | 0 | 3 | ➖ | +------------------+--------+--------+---------+-------+-------+--------+ | Total | 1672 | 7 | 294 | 22 | 1973 | ❌ | +------------------+--------+--------+---------+-------+-------+--------+
added typemaintenance label
added typefeature label and removed typemaintenance label
added featureaddition label
Setting label groupoptimize based on
@ahegyi
's group.added groupoptimize label
added devopsplan sectiondev labels
removed devopsplan sectiondev labels
added devopsplan sectiondev labels
- Resolved by Adam Hegyi
- Resolved by Vladimir Shushlin
mentioned in issue #416010 (closed)
added 301 commits
-
4fdd517e...831896fe - 300 commits from branch
master
- 4105782f - Add experimental HTTP client for ClickHouse
-
4fdd517e...831896fe - 300 commits from branch
changed milestone to %16.2
- Resolved by Adam Hegyi
Hi @mikolaj_wawrzyniak! Could you do the BE review?
requested review from @mikolaj_wawrzyniak
- Resolved by Adam Hegyi
Hi, @cablett! Could you do the initial BE review?
requested review from @cablett and removed review request for @mikolaj_wawrzyniak
mentioned in merge request !124457 (merged)
- Resolved by Adam Hegyi
- Resolved by Vladimir Shushlin
Have we considered creating
active-record-base-clickhouse
adapter? This way we would be able to use all existing scafolding of Rails to create database models, and all existing tooling around handling of SQL queries.This would be another database configured as part of codebase, that would have its own adapter, and its own migrations (if needed).
I'm worried that we would reinvent all models using different interface.
Edited by Kamil Trzciński
- Resolved by Vladimir Shushlin
- Resolved by Vladimir Shushlin
This seems to run fine for me, thank you for the instructions @ahegyi!
> Gitlab::ClickHouse::Client.execute("select * from default.issues", :main) => [{"id"=>1, "title"=>"Issue 1", "description"=>"This is an issue, look at it go", "created_at"=>Tue, 27 Jun 2023 04:13:35.000000000 UTC +00:00, "updated_at"=>Tue, 27 Jun 2023 04:13:35.000000000 UTC +00:00}, {"id"=>2, "title"=>"Issue 2", "description"=>"A second issue, we're on a roll!", "created_at"=>Tue, 27 Jun 2023 04:14:04.000000000 UTC +00:00, "updated_at"=>Tue, 27 Jun 2023 04:14:04.000000000 UTC +00:00}, {"id"=>3, "title"=>"Issue 3", "description"=>"A third issue, unstoppable!", "created_at"=>Tue, 27 Jun 2023 04:14:21.000000000 UTC +00:00, "updated_at"=>Tue, 27 Jun 2023 04:14:21.000000000 UTC +00:00}]
(I had to add
default.issues
to the select query after it didn't initially work -Gitlab::ClickHouse::Client.execute("select * from system.tables", :main)
which contained (in a LOT of output, I piped through
grep
):"create_table_query"=> "CREATE TABLE default.issues (`id` UInt64, `title` String DEFAULT '', `description` Nullable(String), `created_at` DateTime64(6, 'UTC') DEFAULT now(), `updated_at` DateTime64(6, 'UTC') DEFAULT now()) ENGINE = ReplacingMergeTree(updated_at) ORDER BY id SETTINGS index_granularity = 8192",
Anyway, seems to work for me! Is there a reason why I had to append
default.
?
- Resolved by Vladimir Shushlin