Skip to content

Add experimental ClickHouse HTTP client

Adam Hegyi requested to merge ah-clickhouse-client-experiment into master

What does this MR do and why?

This MR adds a lightweight gem for interacting with ClickHouse databases via HTTP interface. We're following the gem guidelines here: https://docs.gitlab.com/ee/development/gems.html

The client was created so groupoptimize can move forward with the PoC where we plan to rebuild a feature using ClickHouse.

The client should support executing an aggregation query and return counts. At a later step it should also support bulk inserts. Models and validations are out of the scope.

Why not use an already existing gem?

The original idea was to add an external gem (#414386 (closed)) however, the gem does much more than we need for this PoC and adding an external gem would require careful review of the codebase. This would eventually mean:

  • Review the code and assess its quality.
  • Forking the gem and making sure we have a running pipeline for the supported ruby versions.

How does it work?

The gem supports multiple databases which can be specified via a YAML file. The Gitlab::ClickHouse::Client::Configuration object contains the databases and other options so we can use our utilities (Gitlab::HTTP, Gitlab::Json).

Without the YAML file the gem will not do anything, configuration is optional.

Query execution flow:

  1. Invoke the execute method using the configured database and by passing a raw SQL string.
  2. The Client invokes the http_post_proc callable object which invokes a HTTP request.
  3. The response will be turned into a Response object which can tell whether the response is successful or not.
  4. The body of the response contains a raw JSON string, this will be parsed with the configured json_parser.
  5. The JSON response includes the returned columns with their respective data types. Using this data, the Formatter casts the values if necessary.
  6. The raw data (array of hashes) are returned.

How to test it

  1. Set up a ClickHouse server and a user with password: https://clickhouse.com/docs/en/install
  2. Start a console: clickhouse-client --password
  3. Create a new database: create database gitlab_clickhouse_test;
  4. Connect to the new database: clickhouse-client -d gitlab_clickhouse_test --password
  5. Create a new table:
CREATE TABLE issues (
  `id` UInt64,
  `title` String DEFAULT '',
  `description` Nullable(String),
  `created_at` DateTime64(6, 'UTC') DEFAULT now(),
  `updated_at` DateTime64(6, 'UTC') DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY (id)
  1. Insert some data:
insert into issues (id, title, description) values (1, 'Title 1', 'description');
insert into issues (id, title, description) values (2, 'Title 2', 'description');
insert into issues (id, title, description) values (3, 'Title 3', null);
  1. Create a config file in your gitlab dir: config/click_house.yml
development:
  main:
    database: gitlab_clickhouse_test
    url: 'http://localhost:8123'
    username: default
    password: clickhouse
  1. Start rails console and execute:
Gitlab::ClickHouse::Client.execute("select * from issues", :main)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Ad
Edited by Adam Hegyi

Merge request reports