ClickHouse集群环境部署
集群安装本文使用的版本是21.12.3.32, 准备6台虚拟机作为集群单机版-传送门https://blog.csdn.net/xushijie89/article/details/122987121集群依赖ZooKeeper服务, 请自行安装ZK集群配置文件修改(1)修改每台配置文件, 文件不存在可以新建一下[root@172.16.131.56 /etc/clickhouse-server]$
集群安装
本文使用的版本是21.12.3.32, 准备6台虚拟机作为集群
单机版-传送门
https://blog.csdn.net/xushijie89/article/details/122987121
集群依赖ZooKeeper服务, 请自行安装ZK集群
配置文件修改
(1)修改每台配置文件, 文件不存在可以新建一下
[root@172.16.131.56 /etc/clickhouse-server]$ vim /etc/clickhouse-server/config.d/metrika.xml
<clickhouse>
<!--不同版本的节点名称不同, 注意修改-->
<remote_servers>
<!--集群名称, 可以自定义, 后面在新建库、表的时候需要用到集群名称-->
<cluster_3shards_2replicas>
<shard>
<!--内部复制(默认false), 开启后, 在分布式表引擎下, 数据写入时-->
<!--每个分片只会去寻找一个节点写, 并不是每个都写-->
<internal_replication>true</internal_replication>
<replica>
<host>hadoop1</host>
<port>9001</port>
</replica>
<replica>
<host>hadoop2</host>
<port>9001</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>hadoop3</host>
<port>9001</port>
</replica>
<replica>
<host>hadoop4</host>
<port>9001</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>hadoop5</host>
<port>9001</port>
</replica>
<replica>
<host>hadoop6</host>
<port>9001</port>
</replica>
</shard>
</cluster_3shards_2replicas>
</remote_servers>
<!--zookeeper配置-->
<zookeeper>
<node>
<host>hadoop1</host>
<port>2181</port>
</node>
<node>
<host>hadoop2</host>
<port>2181</port>
</node>
<node>
<host>hadoop3</host>
<port>2181</port>
</node>
</zookeeper>
<!--全局变量, 建复制表的时候可以使用${}灵活替换, 下文有例子-->
<macros>
<!--01分片机器-->
<shard>01</shard>
<database>database_name</database>
<replica>hadoop1</replica>
<!--01分片副本机器-->
<!--<shard>01</shard>-->
<!--<database>database_name</database>-->
<!--<replica>hadoop2</replica>-->
</macros>
<!--压缩相关配置-->
<compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
</case>
</compression>
</clickhouse>
(2)检查一下ClickHouse配置指向路径是否正确
[root@172.16.131.56 /etc/clickhouse-server]$ vim /etc/clickhouse-server/config.xml
检查这行配置
<include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
(3)启动服务
/etc/init.d/clickhouse-server start
(4)连接控制台, 创建数据库
[root@172.16.131.56 /etc/clickhouse-server/config.d]$ clickhouse-client -m
CREATE DATABASE db_name on cluster cluster_3shards_2replicas COMMENT '测试库'
(5)新建复制表(物理表), 真实保存数据的
create table table_name on cluster cluster_3shards_2replicas (
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) engine
= ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name','{replica}')
partition by toYYYYMMDD(create_time)
primary key (id)
order by (id,sku_id);
on cluster cluster_3shards_2replicas:
建表时加上这个参数, 可以在cluster_3shards_2replicas指定的分片中, 都执行一次命令
ReplicatedMergeTree()的参数:
zoo_path: ZooKeeper中表的路径
replica_name: ZooKeeper中副本的名字
other_parameters(可以不填): Parameters of an engine which is used for creating the replicated version, for example, version in ReplacingMergeTree.
(6)新建分布式表(逻辑表), 不保存数据
CREATE TABLE db_name.table_name_all on cluster cluster_3shards_2replicas(
id UInt32,
sku_id String,
total_amount Decimal(16,2),
create_time Datetime
) ENGINE
= Distributed(cluster_3shards_2replicas, db_name, table_name, hiveHash(id));
Distributed()的参数:
cluster: 配置里的分片名
database: 数据库名
table: 表名
sharding_key(可选): 分片键
policy_name(可选): 策略名, 用于存储临时文件进行异步发送
(6)先在分布式表上插入数据
insert into table_name_all values (201, ‘sku_001’, 1000.00, ‘2020-06-01 12:00:00’);
逐台机器查看数据落在哪个分片的物理表上
select * from table_name;
最后看下分片的副本是否也有数据存在
更多推荐
所有评论(0)