Skip to content
Snippets Groups Projects

Add experimental ClickHouse HTTP client

Merged Adam Hegyi requested to merge ah-clickhouse-client-experiment into master

What does this MR do and why?

This MR adds a lightweight gem for interacting with ClickHouse databases via HTTP interface. We're following the gem guidelines here: https://docs.gitlab.com/ee/development/gems.html

The client was created so groupoptimize can move forward with the PoC where we plan to rebuild a feature using ClickHouse.

The client should support executing an aggregation query and return counts. At a later step it should also support bulk inserts. Models and validations are out of the scope.

Why not use an already existing gem?

The original idea was to add an external gem (#414386 (closed)) however, the gem does much more than we need for this PoC and adding an external gem would require careful review of the codebase. This would eventually mean:

  • Review the code and assess its quality.
  • Forking the gem and making sure we have a running pipeline for the supported ruby versions.

How does it work?

The gem supports multiple databases which can be specified via a YAML file. The Gitlab::ClickHouse::Client::Configuration object contains the databases and other options so we can use our utilities (Gitlab::HTTP, Gitlab::Json).

Without the YAML file the gem will not do anything, configuration is optional.

Query execution flow:

  1. Invoke the execute method using the configured database and by passing a raw SQL string.
  2. The Client invokes the http_post_proc callable object which invokes a HTTP request.
  3. The response will be turned into a Response object which can tell whether the response is successful or not.
  4. The body of the response contains a raw JSON string, this will be parsed with the configured json_parser.
  5. The JSON response includes the returned columns with their respective data types. Using this data, the Formatter casts the values if necessary.
  6. The raw data (array of hashes) are returned.

How to test it

  1. Set up a ClickHouse server and a user with password: https://clickhouse.com/docs/en/install
  2. Start a console: clickhouse-client --password
  3. Create a new database: create database gitlab_clickhouse_test;
  4. Connect to the new database: clickhouse-client -d gitlab_clickhouse_test --password
  5. Create a new table:
CREATE TABLE issues (
  `id` UInt64,
  `title` String DEFAULT '',
  `description` Nullable(String),
  `created_at` DateTime64(6, 'UTC') DEFAULT now(),
  `updated_at` DateTime64(6, 'UTC') DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY (id)
  1. Insert some data:
insert into issues (id, title, description) values (1, 'Title 1', 'description');
insert into issues (id, title, description) values (2, 'Title 2', 'description');
insert into issues (id, title, description) values (3, 'Title 3', null);
  1. Create a config file in your gitlab dir: config/click_house.yml
development:
  main:
    database: gitlab_clickhouse_test
    url: 'http://localhost:8123'
    username: default
    password: clickhouse
  1. Start rails console and execute:
Gitlab::ClickHouse::Client.execute("select * from issues", :main)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Ad
Edited by Adam Hegyi

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @ahegyi

  • A deleted user added backend label

    added backend label

  • Contributor
    2 Warnings
    :warning: This merge request is quite big (665 lines changed), please consider splitting it into multiple merge requests.
    :warning:

    featureaddition and featureenhancement merge requests normally have a documentation change. Consider adding a documentation update or confirming the documentation plan with the Technical Writer counterpart.

    For more information, see:

    1 Message
    :book: CHANGELOG missing:

    If you want to create a changelog entry for GitLab FOSS, add the Changelog trailer to the commit message you want to add to the changelog.

    If you want to create a changelog entry for GitLab EE, also add the EE: true trailer to your commit message.

    If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.

    Pipeline Changes

    This merge request contains changes to the pipeline configuration for the GitLab project.

    Please consider the effect of the changes in this merge request on the following:

    Please consider communicating these changes to the broader team following the communication guideline for pipeline changes

    Reviewer roulette

    Changes that require review have been detected!

    Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:

    Category Reviewer Maintainer
    backend Javiera Tapia current availability (@jtapiab) (UTC-4, 6 hours behind @ahegyi) Mehmet Emin Inac current availability (@minac) (UTC+2, same timezone as @ahegyi)
    maintenanceworkflow / maintenancepipelines for CI, Danger Aboobacker MK current availability (@tachyons-gitlab) (UTC+5.5, 3.5 hours ahead of @ahegyi) David Dieulivol current availability (@ddieulivol) (UTC+2, same timezone as @ahegyi)
    Engineering Productivity Reviewer review is optional for Engineering Productivity David Dieulivol current availability (@ddieulivol) (UTC+2, same timezone as @ahegyi)

    To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.

    To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.

    Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.

    Rubygems

    This merge request adds, or changes a Rubygems dependency. Please review the Gemfile guidelines.

    If needed, you can retry the :repeat: danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

    • Author Maintainer
      Resolved by Adam Hegyi

      @ayufan,

      I'm going through this guide about gem development: https://docs.gitlab.com/ee/development/gems.html and I ran into an issue which I'm unsure how to fix.

      I'm trying to run rubocop in the gem folder:

      cd gems/gitlab-click-house-client
      rubocop

      And I'm getting the following error:

      uninitialized constant RuboCop::RSpec::FactoryBot
      
                include RuboCop::RSpec::FactoryBot::Language
                                      ^^^^^^^^^^^^
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:10:in `<class:StrategyInCallback>'
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:9:in `<module:FactoryBot>'
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:8:in `<module:RSpec>'
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:7:in `<module:Cop>'
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:6:in `<module:RuboCop>'
      gitlab-development-kit/gitlab/rubocop/cop/rspec/factory_bot/strategy_in_callback.rb:5:in `<top (required)>'

      My gem doesn't use factory bot at all so I wonder if we could run it off somehow. I tried disabling RSpec/FactoryBot but it didn't help.

      Have you ran into a similar issue when you worked on the gitlab-rspec gem?

  • Adam Hegyi added 1 commit

    added 1 commit

    • dee89bb5 - ClickHouse Client experiment

    Compare with previous version

  • Contributor

    Allure report

    allure-report-publisher generated test report!

    e2e-test-on-gdk: :exclamation: test report for e29d3227

    expand test summary
    +-----------------------------------------------------------------------+
    |                            suites summary                             |
    +------------------+--------+--------+---------+-------+-------+--------+
    |                  | passed | failed | skipped | flaky | total | result |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Govern           | 2      | 0      | 0       | 0     | 2     | ✅     |
    | Create           | 8      | 0      | 1       | 5     | 9     | ❗     |
    | Monitor          | 4      | 0      | 0       | 4     | 4     | ❗     |
    | Plan             | 4      | 0      | 0       | 4     | 4     | ❗     |
    | Data Stores      | 2      | 0      | 0       | 0     | 2     | ✅     |
    | Manage           | 1      | 0      | 0       | 0     | 1     | ✅     |
    | Framework sanity | 0      | 0      | 1       | 0     | 1     | ➖     |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Total            | 21     | 0      | 2       | 13    | 23    | ❗     |
    +------------------+--------+--------+---------+-------+-------+--------+

    e2e-review-qa: :exclamation: test report for e29d3227

    expand test summary
    +-----------------------------------------------------------------------+
    |                            suites summary                             |
    +------------------+--------+--------+---------+-------+-------+--------+
    |                  | passed | failed | skipped | flaky | total | result |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Create           | 8      | 0      | 1       | 1     | 9     | ❗     |
    | Monitor          | 4      | 0      | 0       | 0     | 4     | ✅     |
    | Data Stores      | 2      | 0      | 0       | 0     | 2     | ✅     |
    | Framework sanity | 0      | 0      | 1       | 0     | 1     | ➖     |
    | Plan             | 3      | 0      | 1       | 0     | 4     | ✅     |
    | Manage           | 1      | 0      | 0       | 0     | 1     | ✅     |
    | Govern           | 2      | 0      | 0       | 0     | 2     | ✅     |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Total            | 20     | 0      | 3       | 1     | 23    | ❗     |
    +------------------+--------+--------+---------+-------+-------+--------+

    e2e-package-and-test: :x: test report for e29d3227

    expand test summary
    +-----------------------------------------------------------------------+
    |                            suites summary                             |
    +------------------+--------+--------+---------+-------+-------+--------+
    |                  | passed | failed | skipped | flaky | total | result |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Data Stores      | 117    | 0      | 3       | 0     | 120   | ✅     |
    | Manage           | 142    | 1      | 27      | 5     | 170   | ❌     |
    | Create           | 559    | 0      | 85      | 8     | 644   | ❗     |
    | Govern           | 143    | 6      | 21      | 6     | 170   | ❌     |
    | Plan             | 243    | 0      | 1       | 0     | 244   | ✅     |
    | Configure        | 1      | 0      | 9       | 0     | 10    | ✅     |
    | Package          | 232    | 0      | 9       | 0     | 241   | ✅     |
    | Verify           | 147    | 0      | 15      | 3     | 162   | ❗     |
    | Monitor          | 36     | 0      | 7       | 0     | 43    | ✅     |
    | Release          | 18     | 0      | 0       | 0     | 18    | ✅     |
    | Secure           | 6      | 0      | 27      | 0     | 33    | ✅     |
    | Fulfillment      | 8      | 0      | 75      | 0     | 83    | ✅     |
    | Analytics        | 7      | 0      | 0       | 0     | 7     | ✅     |
    | Systems          | 11     | 0      | 0       | 0     | 11    | ✅     |
    | Growth           | 0      | 0      | 6       | 0     | 6     | ➖     |
    | Framework sanity | 0      | 0      | 5       | 0     | 5     | ➖     |
    | GitLab Metrics   | 2      | 0      | 1       | 0     | 3     | ✅     |
    | ModelOps         | 0      | 0      | 3       | 0     | 3     | ➖     |
    +------------------+--------+--------+---------+-------+-------+--------+
    | Total            | 1672   | 7      | 294     | 22    | 1973  | ❌     |
    +------------------+--------+--------+---------+-------+-------+--------+
  • Adam Hegyi added 1 commit

    added 1 commit

    • 25760d4b - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added typefeature label and removed typemaintenance label

    added typefeature label and removed typemaintenance label

  • Adam Hegyi added 1 commit

    added 1 commit

    • 6c9fdf17 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Setting label groupoptimize based on @ahegyi's group.

  • Adam Hegyi added 1 commit

    added 1 commit

    • ec9a991d - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi changed title from ClickHouse Client experiment to Add experimental ClickHouse HTTP client

    changed title from ClickHouse Client experiment to Add experimental ClickHouse HTTP client

  • Adam Hegyi changed the description

    changed the description

  • Adam Hegyi added 1 commit

    added 1 commit

    • dd900094 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • ff052850 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • 4fdd517e - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi
  • Adam Hegyi changed the description

    changed the description

  • mentioned in issue #416010 (closed)

  • Adam Hegyi added 301 commits

    added 301 commits

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • d8cf9ed1 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • 194a4394 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • 194a4394 - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi added 1 commit

    added 1 commit

    • cbde021b - Add experimental HTTP client for ClickHouse

    Compare with previous version

  • Adam Hegyi changed milestone to %16.2

    changed milestone to %16.2

  • requested review from @mikolaj_wawrzyniak

  • Adam Hegyi changed the description

    changed the description

  • Adam Hegyi requested review from @cablett and removed review request for @mikolaj_wawrzyniak

    requested review from @cablett and removed review request for @mikolaj_wawrzyniak

  • Author Maintainer

    cc @gitlab-org/plan-stage/optimize-group/engineers/backend for visibility.

  • Adam Hegyi mentioned in merge request !124457 (merged)

    mentioned in merge request !124457 (merged)

  • Kamil Trzciński
    • Resolved by Vladimir Shushlin

      @ahegyi

      Have we considered creating active-record-base-clickhouse adapter? This way we would be able to use all existing scafolding of Rails to create database models, and all existing tooling around handling of SQL queries.

      This would be another database configured as part of codebase, that would have its own adapter, and its own migrations (if needed).

      I'm worried that we would reinvent all models using different interface.

      Edited by Kamil Trzciński
    • Resolved by Vladimir Shushlin

      This seems to run fine for me, thank you for the instructions @ahegyi!

      > Gitlab::ClickHouse::Client.execute("select * from default.issues", :main)
      => [{"id"=>1,
        "title"=>"Issue 1",
        "description"=>"This is an issue, look at it go",
        "created_at"=>Tue, 27 Jun 2023 04:13:35.000000000 UTC +00:00,
        "updated_at"=>Tue, 27 Jun 2023 04:13:35.000000000 UTC +00:00},
       {"id"=>2,
        "title"=>"Issue 2",
        "description"=>"A second issue, we're on a roll!",
        "created_at"=>Tue, 27 Jun 2023 04:14:04.000000000 UTC +00:00,
        "updated_at"=>Tue, 27 Jun 2023 04:14:04.000000000 UTC +00:00},
       {"id"=>3,
        "title"=>"Issue 3",
        "description"=>"A third issue, unstoppable!",
        "created_at"=>Tue, 27 Jun 2023 04:14:21.000000000 UTC +00:00,
        "updated_at"=>Tue, 27 Jun 2023 04:14:21.000000000 UTC +00:00}]

      (I had to add default.issues to the select query after it didn't initially work -

      Gitlab::ClickHouse::Client.execute("select * from system.tables", :main)

      which contained (in a LOT of output, I piped through grep):

      "create_table_query"=>
         "CREATE TABLE default.issues (`id` UInt64, `title` String DEFAULT '', `description` Nullable(String), `created_at` DateTime64(6, 'UTC') DEFAULT now(), `updated_at` DateTime64(6, 'UTC') DEFAULT now()) ENGINE = ReplacingMergeTree(updated_at) ORDER BY id SETTINGS index_granularity = 8192",

      Anyway, seems to work for me! Is there a reason why I had to append default.?

  • Adam Hegyi changed the description

    changed the description

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading