1.背景 

这里以华为云环境部署进行部署。

2.修改主机名
 hostnamectl set-hostname gauss003  (三个节点修改)
 ip add
 修改hosts文件
 vi /etc/hosts
 192.168.0.146    gauss001
 192.168.0.105    gauss002
 192.168.0.238    gauss003
 cat /etc/hosts

2、检查如下软件是否安装

python3  --version
 openssl version
 expect -v   (没有安装,执行yum install -y expect)

3、禁用swap交换内存

修改文件vi /etc/fstab注释掉
 #/dev/mapper/klas-swap   swap                    swap    defaults        0 0
 #/dev/mapper/klas-swap  none    swap    sw,comment=cloudconfig  0       0
 swapoff -a  #立即关掉并生效

4、创建数据目录,并挂载

pvcreate /dev/vdb
vgcreate datavg /dev/vdb 
lvcreate -n datalv -L 99G datavg 
mkfs.xfs /dev/mapper/datavg-datalv 

mkdir /data
mount /dev/mapper/datavg-datalv /data
vi /etc/fstab #文件内容后面添加一行
/dev/mapper/datavg-datalv /data xfs defaults 0 0

5.上传安装包 

tar xvf DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003.tar.gz 
cd DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003

从上面文件夹中找到如下三个包,就是本次最简安装
上传到目录: /data/GaussDBInstaller/pkgDir 
DBS-GaussDB-Adaptor_2.23.07.210.1701140029.tar.gz
GaussDB-Kernel_503.1.0.SPC1700.B003_Om_ARM_Centralized.tar.gz
GaussDB-Kernel_503.1.0.SPC1700.B003_Server_ARM_Centralized.tar.gz

注意:这三个包:在:DBS-GaussDB-Manual_2.23.07.260.796048944315456.tar.gz 这个总包中
通过层层解压也能找到。但是由于华为软件包在管理方面显得比较混乱,如果不熟悉GaussDB,
其实很难找到。

6.关于CPU问题(核心问题)

华为云CPU:标签。
Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

--公司虚拟机的CPU标签:
Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

华为GaussDB的部署对CPU有要求,必须是flag比较复杂的这种CPU,我们之前在虚拟机上部署的ARM环境的麒麟linux操作系统,CPU的flag是比较简单的,无论如何都安装不上去。后来找华为工程师确认了,购买云环境才安装成功。

7.配置NTP时钟同步

--安装NTP;
yum install ntp ntpdate -y
yum -y install ntpstat 



启动ntp服务。

--主NTP:192.168.0.146
echo "server 127.127.1.0 iburst">>/etc/ntp.conf 
systemctl enable ntpd 
systemctl restart ntpd 

[root@lin-kylinv10-sp1-arm01wangzhongyue20240318 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.           5 l    9   64  377    0.000   +0.000   0.000

ntpstat 
[root@lin-kylinv10-sp1-arm01wangzhongyue20240318 agent]# ntpstat 
unsynchronised
poll interval unknown

--别的主机:其他所有主机执行。62/63 
--root:
echo "server 192.168.0.146">>/etc/ntp.conf 
echo "restrict 192.168.0.146 nomodify notrap noquery">>/etc/ntp.conf 
ntpdate -u 192.168.0.146
hwclock -w 
systemctl enable ntpd 
systemctl restart ntpd 
systemctl status ntpd

[root@lin-kylinv10-sp1-arm02wangzhongyue20240318 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*192.168.0.146    LOCAL(0)         6 u  814 1024  377    2.416   -0.222   0.581

[root@lin-kylinv10-sp1-arm02wangzhongyue20240318 ~]# ntpstat 
synchronised to NTP server (192.168.0.146) at stratum 7
   time correct to within 41 ms
   polling server every 1024 s

[root@lin-kylinv10-sp1-arm03wangzhongyue20240318 ~]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*192.168.0.146    LOCAL(0)         6 u  824 1024  377    0.797   -0.144   0.515
[root@lin-kylinv10-sp1-arm03wangzhongyue20240318 ~]# ntpstat
synchronised to NTP server (192.168.0.146) at stratum 7
   time correct to within 40 ms
   polling server every 1024 s

8.添加资源限制参数。

echo "* soft nofile 1000000" >/etc/security/limits.conf
echo "* hard nofile 1000000" >>/etc/security/limits.conf

--修改OPEN FILE=1000000
--加入环境变量:/etc/profile 永久生效。
ulimit -n 1000000   --会话级别生效。

9.修改网卡的MTU;

ip link set dev eth0 mtu 8192

10.上传安装工具包DBInstaller

--哪个版本都可以。
tar xvf GaussDBInstaller_1.0.6.8.tar.gz 
cd GaussDBInstaller_1.0.6.8

11.编辑集群配置文件

 cat install_cluster.conf 
[COMMON]
os_user = omm 
os_user_group = ${os_user}
os_user_home = /home/${os_user}
os_user_passwd = Att@2022     ---安装时自动创建操作系统用户omm
root_passwd = Root#123    --操作系统root用户的密码。
ssh_port = 22
node_ip_list = 192.168.0.146,192.168.0.105,192.168.0.238

[OMAGENT]
gauss_home = /data/cluster
om_agent_port = 30170
mgr_net = 
data_net = 
virtual_net = 
log_dir     = ${gauss_home}/logs/gaussdb
cn_dir     = ${gauss_home}/data/cn
gtm_dir     = ${gauss_home}/data/gtm
cm_dir      = ${gauss_home}/data/cm
tmp_dir     = ${gauss_home}/temp
data_dir    = ${gauss_home}/data/dn
tool_dir    = ${gauss_home}/tools
etcd_dir    = ${gauss_home}/data/etcd

12.修改集群按照模板

--json 模板使用这个,支持性不好,需要测试。
3_nodes_centralized.json  
--支持:1主2备,不支持1主一备一日志。

mv 3_nodes_centralized.json   install_cluster.json 

cat install_cluster.json 
{
  "rdsAdminUser": "rdsAdmin",
  "rdsAdminPasswd": "Root#123",
  "rdsMetricUser": "rdsMetric",
  "rdsMetricPasswd": "Root#123",
  "rdsReplUser": "rdsRepl",
  "rdsReplPasswd": "Root#123",
  "rdsBackupUser": "rdsBackup",
  "rdsBackupPasswd": "Root#123",
  "dbPort": "30100",
  "dbUser": "root",
  "dbUserPasswd": "Root#123",
  "clusterMode": "ha",
  "encoding": "utf8",
  "params": {
    "enable_thread_pool": "on",
    "enable_bbox_dump": "on",
    "bbox_dump_path": "/home/core"
  },
  "cnParams": {},
  "dnParams": {},
  "cmParams": {},
  "clusterConf": {
    "clusterName": "Gauss_Att",
    "shardingNum": 1,
    "replicaNum": 3,
    "solution": "hws",
    "consistencyProtocol": "paxos",
    "cm": [
      {
        "rack": "gauss001",
        "az": "AZ1",
        "ip": "192.168.0.146",
        "dataIp": "192.168.0.146",
        "virtualIp": "192.168.0.146"
      },
      {
        "rack": "gauss002",
        "az": "AZ2",
        "ip": "192.168.0.105",
        "dataIp": "192.168.0.105",
        "virtualIp": "192.168.0.105"
      },
      {
        "rack": "gauss003",
        "az": "AZ3",
        "ip": "192.168.0.238",
        "dataIp": "192.168.0.238",
        "virtualIp": "192.168.0.238"
      }
    ],
    "shards": [
      [
        {
          "rack": "gauss001",
          "az": "AZ1",
          "ip": "192.168.0.146",
          "dataIp": "192.168.0.146",
          "virtualIp": "192.168.0.146"
        },
        {
          "rack": "gauss002",
          "az": "AZ2",
          "ip": "192.168.0.105",
          "dataIp": "192.168.0.105",
          "virtualIp": "192.168.0.105"
        },
        {
          "rack": "gauss003",
          "az": "AZ3",
          "ip": "192.168.0.238",
          "dataIp": "192.168.0.238",
          "virtualIp": "192.168.0.238",
          "loggerRole": "on"
        }
      ]
    ],
    "etcd": {
      "nodes": [
        {
          "rack": "gauss001",
          "az": "AZ1",
          "ip": "192.168.0.146",
          "dataIp": "192.168.0.146",
          "virtualIp": "192.168.0.146"
        },
        {
          "rack": "gauss002",
          "az": "AZ2",
          "ip": "192.168.0.105",
          "dataIp": "192.168.0.105",
          "virtualIp": "192.168.0.105"
        },
        {
          "rack": "gauss003",
          "az": "AZ3",
          "ip": "192.168.0.238",
          "dataIp": "192.168.0.238",
          "virtualIp": "192.168.0.238"
        }
      ]
    }
  }
}

这里有一个需要注意的地方:rack (机架)的名称必须和主机名称相同,如果不同,会按照失败。

13.安装集群 

python3 gaussdb_install.py --action main



安装报错:
[2024-03-27 12:14:42][root][ERROR]:InstallCluster in local host 192.168.0.146 execute failed, 
Error: {"retcode": 1, "detailmsg": "Fail to install, 
Error: [FAILURE] 192.168.0.146:\n
[GAUSS-50219] : Failed to obtain RTNETLINK answers: Operation not permitted 1. 
There are illegal characters.\n[FAILURE] 192.168.0.105:\n
[GAUSS-50219] : Failed to obtain RTNETLINK answers: Operation not permitted 1. 
There are illegal characters.\n[FAILURE] 192.168.0.238:\n[GAUSS-50219] : 
Failed to obtain RTNETLINK answers: Operation not permitted 1. 
There are illegal characters."}


su - omm 
切换用户报错:
RTNETLINK answers: Operation not permitted 

[root@gaussdb01 ~]# su - omm
Last login: Wed Mar 27 12:25:43 CST 2024 on pts/0
RTNETLINK answers: Operation not permitted
--解决方法:
chmod u+x /usr/sbin/ip
chmod u+s /sbin/ip

[root@gaussdb01 ~]# su - omm
Last login: Wed Mar 27 12:36:08 CST 2024 from 192.168.0.146 on pts/4

--重新安装集群。
[root@gaussdb01 GaussDBInstaller]# python3 gaussdb_install.py --action main

14.安装成功后包检查 

--在安装过程中会自动解压成如下的样子。
[omm@gaussdb01 pkgDir]$ ll
total 636496
-rwxr-xr-x  1 omm omm  33097515 Mar 27 10:56 GaussDB-Kernel_503.1.0.SPC1700.B003_Om_ARM_Centralized.tar.gz
-rwxr-xr-x  1 omm omm    394526 Mar 27 10:55 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_ADAPTOR.tar.gz
-rw-r--r--  1 omm omm  10068548 Sep 22  2023 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_Agent.tar.gz
-rw-r--r--  1 omm omm  23622524 Sep 22  2023 GaussDB-Kernel_503.1.0.SPC1700_Kylin_64bit_Om.tar.gz
drwx------  4 omm omm        81 Mar 27 16:46 omagent
drwx------ 10 omm omm      4096 Mar 27 16:48 server
-rwxr-xr-x  1 omm omm 584572429 Mar 27 11:55 server.tar.gz
[omm@gaussdb01 pkgDir]$ pwd
/data/GaussDBInstaller/pkgDir

集群在部署过程中,我们上传的包会转换为如上的样子。也就是自动解压过了。需要注意的是,我们需要将包上传到:/data/GaussDBInstaller/pkgDir 目录下。

15.部署成功后集群状态检查

[omm@gaussdb01 ~]$ cm_ctl query -Cv
[  CMServer State   ]

node             instance state
---------------------------------
1  192.168.0.146 1        Primary
2  192.168.0.105 2        Standby
3  192.168.0.238 3        Standby

[    ETCD State     ]

node             instance state
---------------------------------------
1  192.168.0.146 7001     StateLeader
2  192.168.0.105 7002     StateFollower
3  192.168.0.238 7003     StateFollower

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             instance state            | node             instance state            | node             instance state
---------------------------------------------------------------------------------------------------------------------------------------
1  192.168.0.146 6001     P Primary Normal | 2  192.168.0.105 6002     S Standby Normal | 3  192.168.0.238 6003     S Standby Normal

16.避坑总结

(1)CPU:ARM环境下,CPU的flag短的不支持:Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

CPU的flag较长的支持:Flags:fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

关于CPU的长短问题,可以找操作系统工程师和硬件工程师咨询。

(2)install_cluster.json 里面rack的名词必须和主机名相同,否则不成功。

(3)一主两备的架构比较容易部署成功,一主一备一日志的的架构不容易部署成功。

(4)华为的包太多,不容易分清,选择最简安装需要找到合适的包。最简安装认准:

DBS-MetaDB_Kylin_Centralized_503.1.0.SPC1700.B003.tar.gz 

DBS-MetaDB_Kylin_Centralized_***.tar.gz  开头的包,里面包含了所有最简安装涉及的包。

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐