Research and understand component costs and maintenance requirements of running a ClickHouse instance with GitLab
Analysis plan
Identified action items to get familiar with ClickHouse usage in GitLab:
-
Review ClickHouse usage - current usage
- future usage
-
Review Deployment process - Supported options
- SaaS - only AWS, GCP and Azure marked as coming soon - https://clickhouse.com/pricing
- Self-Managed
- VM - https://clickhouse.com/docs/en/install/#self-managed-install, https://packages.clickhouse.com/, https://github.com/ClickHouse/ClickHouse/releases
- k8s - ClickHouse doesn’t provide a Kubernetes operator, GitLab team implemented for internal use Implement ClickHouse operator, use in Opstrace ... (gitlab-org/opstrace/opstrace#1643 - closed)
- Supported cloud providers
- Supports major cloud providers like AWS, GCP, Azure
- Object storage: S3 and GCS are supported, Azure support is in work
- FIPS compliant since version 22.5
- Supported for offline environments
- Spec requirements
- Supported options
-
Review maintenance - High availability support and failovers
- Supported
- No step-by step guide at the moment, team is working on self-managed HA documentation how to set architectures up. HA cluster video guide https://www.youtube.com/watch?v=vBjCJtw_Ei0
- No released automation for deploying HA for self-managed at the moment
- Upgrade/downgrade process - https://clickhouse.com/docs/en/operations/update/
- Supports zero-down time - incremental upgrades - if the difference between the current version and the target version is more than one year https://clickhouse.com/docs/en/operations/update#incremental-upgrades
- Backup/restore process - https://clickhouse.com/docs/en/operations/backup/
- Configured through an S3 Endpoint - https://clickhouse.com/docs/en/operations/backup#configuring-backuprestore-to-use-an-s3-endpoint => if object storage supports S3 protocol, it works. Caveat: Azure doesn't have native support for S3.
- Speed depends on data size, source disk speed, network, target disk speed.
- Test data: 1 Tb about an hour
- Monitoring - https://clickhouse.com/docs/en/operations/monitoring/
- Supports Prometheus, Grafana and Datadog
- Scaling
- Scaling up is straightforward https://clickhouse.com/docs/en/guides/sre/scaling-clusters/
- Scaling down is more complicated when sharding
- Benchmarks and testing
- https://benchmark.clickhouse.com/hardware/
- https://github.com/ClickHouse/ClickHouse/tree/master/tests
- For internal GitLab integration testing will require to use VM to set up ClickHouse
- High availability support and failovers
-
Calculate potential cost
GitLab and ClickHouse related topics:
- Build test instance and connect with GitLab
- Version mapping with GitLab
- Correlation between data size for CH based and GitLab PG data
Questions to ClickHouse team
The goal is to understand maintenance and costs that customers will face if they want to configure GitLab with ClickHouse.
Please see this internal document from discussion with ClickHouse team for below questions.
Questions
Deployment
- What is the recommended ClickHouse installation method?
- How mature is Operator with regards to maintenance?
- Is there official CH chart?
- Is AlmaLinux or Amazon Linux 2 supported?
- Does it support offline installation?
- FIPS compliance?
- Are there limitations with any existing cloud provider and ClickHouse?
- Are there IOPS recommendations for disks for CH?
Maintenance
- Is there documentation for high-availability setup on self-managed?
- Upgrades
- Does it support zero-downtime upgrade? Related to https://clickhouse.com/docs/en/operations/update/#incremental-upgrades
-
Backup and restore
- Are other object storages than S3 supported?
- Monitoring recommendations for self-managed?
- Is there ability to scale self-managed ClickHouse up/down? What is the process for adding more capacity to CH server?
- Is there recommendation for migrating servers to new VMs or cluster?
- Is the above being tested on some schedule?
Questions to GitLab ClickHouse owner team
- If customer wouldn't want to support this new database for GitLab, will they be limited in usage of GitLab features or there will be a fallback to PG usage?
- What specs are currently recommended for CH with GitLab installation?
- Is there correlation between GitLab PG database size and size for CH?
- Can we calculate rough specs using existing recommendations for machines for PG nodes in GitLab?
- Would ClickHouse upgrade result in downtime for GitLab?
- Will GitLab need to maintain backup and restore for CH database?
- How GitLab version and ClickHouse version mapping will work?
Edited by Nailia Iskhakova