【GaussDB】主备架构部署避坑指南
华为GaussDB的部署对CPU有要求,必须是flag比较复杂的这种CPU,我们之前在虚拟机上部署的ARM环境的麒麟linux操作系统,CPU的flag是比较简单的,无论如何都安装不上去。DBS-MetaDB_Kylin_Centralized_***.tar.gz开头的包,里面包含了所有最简安装涉及的包。(4)华为的包太多,不容易分清,选择最简安装需要找到合适的包。一主两备的架构比较容易部署成功
1.背景
这里以华为云环境部署进行部署。
2.修改主机名
hostnamectl set-hostname gauss003 (三个节点修改)
ip add
修改hosts文件
vi /etc/hosts
192.168.0.146 gauss001
192.168.0.105 gauss002
192.168.0.238 gauss003
cat /etc/hosts
2、检查如下软件是否安装
python3 --version
openssl version
expect -v (没有安装,执行yum install -y expect)
3、禁用swap交换内存
修改文件vi /etc/fstab注释掉
#/dev/mapper/klas-swap swap swap defaults 0 0
#/dev/mapper/klas-swap none swap sw,comment=cloudconfig 0 0
swapoff -a #立即关掉并生效
4、创建数据目录,并挂载
pvcreate /dev/vdb
vgcreate datavg /dev/vdb
lvcreate -n datalv -L 99G datavg
mkfs.xfs /dev/mapper/datavg-datalv
mkdir /data
mount /dev/mapper/datavg-datalv /data
vi /etc/fstab #文件内容后面添加一行
/dev/mapper/datavg-datalv /data xfs defaults 0 0
5.上传安装包
tar xvf DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003.tar.gz
cd DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003
从上面文件夹中找到如下三个包,就是本次最简安装
上传到目录: /data/GaussDBInstaller/pkgDir
DBS-GaussDB-Adaptor_2.23.07.210.1701140029.tar.gz
GaussDB-Kernel_503.1.0.SPC1700.B003_Om_ARM_Centralized.tar.gz
GaussDB-Kernel_503.1.0.SPC1700.B003_Server_ARM_Centralized.tar.gz
注意:这三个包:在:DBS-GaussDB-Manual_2.23.07.260.796048944315456.tar.gz 这个总包中
通过层层解压也能找到。但是由于华为软件包在管理方面显得比较混乱,如果不熟悉GaussDB,
其实很难找到。
6.关于CPU问题(核心问题)
华为云CPU:标签。
Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
--公司虚拟机的CPU标签:
Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
华为GaussDB的部署对CPU有要求,必须是flag比较复杂的这种CPU,我们之前在虚拟机上部署的ARM环境的麒麟linux操作系统,CPU的flag是比较简单的,无论如何都安装不上去。后来找华为工程师确认了,购买云环境才安装成功。
7.配置NTP时钟同步
--安装NTP;
yum install ntp ntpdate -y
yum -y install ntpstat
启动ntp服务。
--主NTP:192.168.0.146
echo "server 127.127.1.0 iburst">>/etc/ntp.conf
systemctl enable ntpd
systemctl restart ntpd
[root@lin-kylinv10-sp1-arm01wangzhongyue20240318 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*LOCAL(0) .LOCL. 5 l 9 64 377 0.000 +0.000 0.000
ntpstat
[root@lin-kylinv10-sp1-arm01wangzhongyue20240318 agent]# ntpstat
unsynchronised
poll interval unknown
--别的主机:其他所有主机执行。62/63
--root:
echo "server 192.168.0.146">>/etc/ntp.conf
echo "restrict 192.168.0.146 nomodify notrap noquery">>/etc/ntp.conf
ntpdate -u 192.168.0.146
hwclock -w
systemctl enable ntpd
systemctl restart ntpd
systemctl status ntpd
[root@lin-kylinv10-sp1-arm02wangzhongyue20240318 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.0.146 LOCAL(0) 6 u 814 1024 377 2.416 -0.222 0.581
[root@lin-kylinv10-sp1-arm02wangzhongyue20240318 ~]# ntpstat
synchronised to NTP server (192.168.0.146) at stratum 7
time correct to within 41 ms
polling server every 1024 s
[root@lin-kylinv10-sp1-arm03wangzhongyue20240318 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.0.146 LOCAL(0) 6 u 824 1024 377 0.797 -0.144 0.515
[root@lin-kylinv10-sp1-arm03wangzhongyue20240318 ~]# ntpstat
synchronised to NTP server (192.168.0.146) at stratum 7
time correct to within 40 ms
polling server every 1024 s
8.添加资源限制参数。
echo "* soft nofile 1000000" >/etc/security/limits.conf
echo "* hard nofile 1000000" >>/etc/security/limits.conf
--修改OPEN FILE=1000000
--加入环境变量:/etc/profile 永久生效。
ulimit -n 1000000 --会话级别生效。
9.修改网卡的MTU;
ip link set dev eth0 mtu 8192
10.上传安装工具包DBInstaller
--哪个版本都可以。
tar xvf GaussDBInstaller_1.0.6.8.tar.gz
cd GaussDBInstaller_1.0.6.8
11.编辑集群配置文件
cat install_cluster.conf
[COMMON]
os_user = omm
os_user_group = ${os_user}
os_user_home = /home/${os_user}
os_user_passwd = Att@2022 ---安装时自动创建操作系统用户omm
root_passwd = Root#123 --操作系统root用户的密码。
ssh_port = 22
node_ip_list = 192.168.0.146,192.168.0.105,192.168.0.238
[OMAGENT]
gauss_home = /data/cluster
om_agent_port = 30170
mgr_net =
data_net =
virtual_net =
log_dir = ${gauss_home}/logs/gaussdb
cn_dir = ${gauss_home}/data/cn
gtm_dir = ${gauss_home}/data/gtm
cm_dir = ${gauss_home}/data/cm
tmp_dir = ${gauss_home}/temp
data_dir = ${gauss_home}/data/dn
tool_dir = ${gauss_home}/tools
etcd_dir = ${gauss_home}/data/etcd
12.修改集群按照模板
--json 模板使用这个,支持性不好,需要测试。
3_nodes_centralized.json
--支持:1主2备,不支持1主一备一日志。
mv 3_nodes_centralized.json install_cluster.json
cat install_cluster.json
{
"rdsAdminUser": "rdsAdmin",
"rdsAdminPasswd": "Root#123",
"rdsMetricUser": "rdsMetric",
"rdsMetricPasswd": "Root#123",
"rdsReplUser": "rdsRepl",
"rdsReplPasswd": "Root#123",
"rdsBackupUser": "rdsBackup",
"rdsBackupPasswd": "Root#123",
"dbPort": "30100",
"dbUser": "root",
"dbUserPasswd": "Root#123",
"clusterMode": "ha",
"encoding": "utf8",
"params": {
"enable_thread_pool": "on",
"enable_bbox_dump": "on",
"bbox_dump_path": "/home/core"
},
"cnParams": {},
"dnParams": {},
"cmParams": {},
"clusterConf": {
"clusterName": "Gauss_Att",
"shardingNum": 1,
"replicaNum": 3,
"solution": "hws",
"consistencyProtocol": "paxos",
"cm": [
{
"rack": "gauss001",
"az": "AZ1",
"ip": "192.168.0.146",
"dataIp": "192.168.0.146",
"virtualIp": "192.168.0.146"
},
{
"rack": "gauss002",
"az": "AZ2",
"ip": "192.168.0.105",
"dataIp": "192.168.0.105",
"virtualIp": "192.168.0.105"
},
{
"rack": "gauss003",
"az": "AZ3",
"ip": "192.168.0.238",
"dataIp": "192.168.0.238",
"virtualIp": "192.168.0.238"
}
],
"shards": [
[
{
"rack": "gauss001",
"az": "AZ1",
"ip": "192.168.0.146",
"dataIp": "192.168.0.146",
"virtualIp": "192.168.0.146"
},
{
"rack": "gauss002",
"az": "AZ2",
"ip": "192.168.0.105",
"dataIp": "192.168.0.105",
"virtualIp": "192.168.0.105"
},
{
"rack": "gauss003",
"az": "AZ3",
"ip": "192.168.0.238",
"dataIp": "192.168.0.238",
"virtualIp": "192.168.0.238",
"loggerRole": "on"
}
]
],
"etcd": {
"nodes": [
{
"rack": "gauss001",
"az": "AZ1",
"ip": "192.168.0.146",
"dataIp": "192.168.0.146",
"virtualIp": "192.168.0.146"
},
{
"rack": "gauss002",
"az": "AZ2",
"ip": "192.168.0.105",
"dataIp": "192.168.0.105",
"virtualIp": "192.168.0.105"
},
{
"rack": "gauss003",
"az": "AZ3",
"ip": "192.168.0.238",
"dataIp": "192.168.0.238",
"virtualIp": "192.168.0.238"
}
]
}
}
}
这里有一个需要注意的地方:rack (机架)的名称必须和主机名称相同,如果不同,会按照失败。
13.安装集群
python3 gaussdb_install.py --action main
安装报错:
[2024-03-27 12:14:42][root][ERROR]:InstallCluster in local host 192.168.0.146 execute failed,
Error: {"retcode": 1, "detailmsg": "Fail to install,
Error: [FAILURE] 192.168.0.146:\n
[GAUSS-50219] : Failed to obtain RTNETLINK answers: Operation not permitted 1.
There are illegal characters.\n[FAILURE] 192.168.0.105:\n
[GAUSS-50219] : Failed to obtain RTNETLINK answers: Operation not permitted 1.
There are illegal characters.\n[FAILURE] 192.168.0.238:\n[GAUSS-50219] :
Failed to obtain RTNETLINK answers: Operation not permitted 1.
There are illegal characters."}
su - omm
切换用户报错:
RTNETLINK answers: Operation not permitted
[root@gaussdb01 ~]# su - omm
Last login: Wed Mar 27 12:25:43 CST 2024 on pts/0
RTNETLINK answers: Operation not permitted
--解决方法:
chmod u+x /usr/sbin/ip
chmod u+s /sbin/ip
[root@gaussdb01 ~]# su - omm
Last login: Wed Mar 27 12:36:08 CST 2024 from 192.168.0.146 on pts/4
--重新安装集群。
[root@gaussdb01 GaussDBInstaller]# python3 gaussdb_install.py --action main
14.安装成功后包检查
--在安装过程中会自动解压成如下的样子。
[omm@gaussdb01 pkgDir]$ ll
total 636496
-rwxr-xr-x 1 omm omm 33097515 Mar 27 10:56 GaussDB-Kernel_503.1.0.SPC1700.B003_Om_ARM_Centralized.tar.gz
-rwxr-xr-x 1 omm omm 394526 Mar 27 10:55 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_ADAPTOR.tar.gz
-rw-r--r-- 1 omm omm 10068548 Sep 22 2023 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_Agent.tar.gz
-rw-r--r-- 1 omm omm 23622524 Sep 22 2023 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_Om.tar.gz
drwx------ 4 omm omm 81 Mar 27 16:46 omagent
drwx------ 10 omm omm 4096 Mar 27 16:48 server
-rwxr-xr-x 1 omm omm 584572429 Mar 27 11:55 server.tar.gz
[omm@gaussdb01 pkgDir]$ pwd
/data/GaussDBInstaller/pkgDir
集群在部署过程中,我们上传的包会转换为如上的样子。也就是自动解压过了。需要注意的是,我们需要将包上传到:/data/GaussDBInstaller/pkgDir 目录下。
15.部署成功后集群状态检查
[omm@gaussdb01 ~]$ cm_ctl query -Cv
[ CMServer State ]
node instance state
---------------------------------
1 192.168.0.146 1 Primary
2 192.168.0.105 2 Standby
3 192.168.0.238 3 Standby
[ ETCD State ]
node instance state
---------------------------------------
1 192.168.0.146 7001 StateLeader
2 192.168.0.105 7002 StateFollower
3 192.168.0.238 7003 StateFollower
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node instance state | node instance state | node instance state
---------------------------------------------------------------------------------------------------------------------------------------
1 192.168.0.146 6001 P Primary Normal | 2 192.168.0.105 6002 S Standby Normal | 3 192.168.0.238 6003 S Standby Normal
16.避坑总结
(1)CPU:ARM环境下,CPU的flag短的不支持:Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU的flag较长的支持:Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
关于CPU的长短问题,可以找操作系统工程师和硬件工程师咨询。
(2)install_cluster.json 里面rack的名词必须和主机名相同,否则不成功。
(3)一主两备的架构比较容易部署成功,一主一备一日志的的架构不容易部署成功。
(4)华为的包太多,不容易分清,选择最简安装需要找到合适的包。最简安装认准:
DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003.tar.gz
DBS-MetaDB_Kylin_Centralized_***.tar.gz 开头的包,里面包含了所有最简安装涉及的包。
更多推荐
所有评论(0)