Define sane, secure defaults for ClickHouse usage and provide security recommendations

As part of the ClickHouse Datastore Working Group, we need to "Define sane and secure defaults. Provide security recommendations and guardrails." (Exit Criteria). This issue will track and organize work for this task.

Recommendations

Users

Files: users.xml and config.xml

Topic Security Requirement Reason
user_name/password Usernames must not be blank. Passwords must use password_sha256_hex and must not be blank plaintext and password_double_sha1_hex are insecure. If username isn't specified, default is used with no password
access_management Use Server configuration files users.xml and config.xml. (Avoid SQL-driven workflow) SQL-driven workflow implies that at least one user has access_management which can be avoided via config files. These files are easier to audit and monitor too, considering that "You can’t manage the same access entity by both configuration methods simultaneously."
user_name/networks At least one of <ip>, <host>, <host_regexp> must be set. Do not use <ip>::/0</ip> to open access for any network.
user_name/profile Use profiles to set similar properties across multiple users and set limits (from the user interface) Least privileges principle and limits
user_name/quota Set quotas for users whenever possible Limit resource usage over a period of time or track the use of resources
user_name/databases Restrict access to data, and avoid users will full access Least privileges principle

Network

Files: config.xml

Topic Security Requirement Reason
mysql_port Disable MySQL access unless strictly necessary:
<!-- <mysql_port>9004</mysql_port> -->
Close unnecessary ports and features exposure
postgresql_port Disable PostgreSQL access unless strictly necessary:
<!-- <mysql_port>9005</mysql_port> -->
Close unnecessary ports and features exposure
http_port/https_port & tcp_port/tcp_port_secure Configure SSL-TLS, and disable non SSL ports:
<!-- <http_port>8123</http_port> -->
<!-- <tcp_port>9000</tcp_port> -->
and enable secure ports:
<https_port>8443</https_port>
<tcp_port_secure>9440</tcp_port_secure>
Data in transit must be encrypted as per our Cryptographic Protection Controls and GitLab Crypto Standard on TLS
interserver_http_host Disable interserver_http_host in favor of interserver_https_host (<interserver_https_port>9010</interserver_https_port>) if ClickHouse is configured as a cluster Data in transit must be encrypted as per our Cryptographic Protection Controls and GitLab Crypto Standard on TLS

Storage

Topic Security Requirement Reason
Permissions ClickHouse runs by default with the clickhouse user. Running as root is never needed. Use the principle of least privileges for the folders: /etc/clickhouse-server, /var/lib/clickhouse, /var/log/clickhouse-server. These folders must belong to the clickhouse user and group, and no other system user must have access to them.
Encryption Use an encrypted storage for logs and data if RED data is processed. On Kubernetes, the StorageClass used may be encrypted RED data must be encrypted at rest.

Logging

Topic Security Requirement Reason
logger Log and errorlog must be defined and writable by clickhouse Make sure logs are stored
SIEM If hosted on gitlab.com, the Clickhouse instance or cluster must report logs to our SIEM (internal link)
Log sensitive data Query masking rules must be used if sensitive data can be logged. See example bellow Column level encryption can be used and leak sensitive data (keys) in logs.

Example masking rules:

<query_masking_rules>
    <rule>
        <name>hide SSN</name>
        <regexp>(^|\D)\d{3}-\d{2}-\d{4}($|\D)</regexp>
        <replace>000-00-0000</replace>
    </rule>
    <rule>
        <name>hide encrypt/decrypt arguments</name>
        <regexp>
           ((?:aes_)?(?:encrypt|decrypt)(?:_mysql)?)\s*\(\s*(?:'(?:\\'|.)+'|.*?)\s*\)
        </regexp>
        <replace>\1(???)</replace>
    </rule>
</query_masking_rules>

Edited by Philippe Lafoucrière