云计算基础架构(一) 数据库
官网: https://github.com/yoshinorim/mha4mysql-manager/wiki前提:1) 所有节点之间, 相互之间免密认证2) 所有节点yum源配置正常3) gw服务器有相关的安装包4) 系统时间要正确5) /etc/hosts要统一(本实验通过dns解析已经实现)环境拓扑:node01 10.15.200.101 Primary_MasterServerID=1
官网: https://github.com/yoshinorim/mha4mysql-manager/wiki
前提:
1) 所有节点之间, 相互之间免密认证
2) 所有节点yum源配置正常
3) gw服务器有相关的安装包
4) 系统时间要正确
5) /etc/hosts要统一(本实验通过dns解析已经实现)
环境拓扑:
node01 10.15.200.101 Primary_Master ServerID=101 主(读写)
node02 10.15.200.102 Candidate_Master ServerID=102 从(读)
node03 10.15.200.103 slave01 ServerID=103 从(读)
node05 10.15.200.105 mha ServerID=105 高可用监控
node01为主服务器 node02为从服务器 node03为从服务器
当主服务器:node01故障时 node02由从提升为主 node03将主服务器重新指向node02(原为node01)
# 1. 所有节点(node01 node02 node03 node05) 清空之前实验的相关数据 (删除之前实验环境数据 清空yum旧缓存 更新yum新缓存
# 或是 直接还原快照
yum remove Percona-XtraDB-Cluster* httpd* php* Percona-Server* mariadb* -y
yum remove mysql-community-server -y
rm -fr /var/log/mysqld.log /var/lib/mysql/*
yum clean all && yum makecache
# 2. 在node01 node02 node03 安装mysql-community-5.7
yum -y install mysql-community-server
# 数据库配置文件已经放在gw (如果没有 请自行上传)
node01: wget http://10.15.200.8/files/mha/node01.my.cnf -O /etc/my.cnf
node02: wget http://10.15.200.8/files/mha/node02.my.cnf -O /etc/my.cnf
node03: wget http://10.15.200.8/files/mha/node03.my.cnf -O /etc/my.cnf
或者是这样的一条命令:
wget http://10.15.200.8/files/mha/`hostname -s`.my.cnf -O /etc/my.cnf
# 3. 在所有节点 安装MHA (node01 node02 node03 node05)
yum install mha4mysql-node-0.58 -y
主节点: node01 安装数据库 启动服务 配置用户名和密码
[root@node01 ~]# rm -fr /var/lib/mysql/* && mysqld --initialize && chown -R mysql:mysql /var/lib/mysql
[root@node01 ~]# systemctl start mysqld
[root@node01 ~]# grep 'temporary password' /var/log/mysqld.log | awk '{print $11}'
Fc-8L0tu?rMh
# 修改root密码 创建用户复制用户repl及mha管理用户mha
[root@node01 ~]# mysql -uroot -p'Fc-8L0tu?rMh'
ALTER USER 'root'@'localhost' IDENTIFIED BY '!@#qweASD69';
grant replication slave on *.* to repl@'10.15.200.%' identified by '!@#qweASD69';
grant all on *.* to mha@'10.15.200.%' identified by '!@#qweASD69';
flush privileges;
因为要做主从实验,所以各个服务器的server-id不能相同,我们需要在这里进行查看,确保没有问题
node02: 同理类似
[root@node02 ~]# rm -fr /var/lib/mysql/* && mysqld --initialize --user=mysql
[root@node02 ~]# systemctl start mysqld
[root@node02 ~]# grep 'temporary password' /var/log/mysqld.log | awk '{print $11}'
G:aFM:4tkn4y
[root@node02 ~]# mysql -uroot -p'G:aFM:4tkn4y'
ALTER USER 'root'@'localhost' IDENTIFIED BY '!@#qweASD69';
grant replication slave on *.* to repl@'10.15.200.%' identified by '!@#qweASD69';
grant all on *.* to mha@'10.15.200.%' identified by '!@#qweASD69';
flush privileges;
node02上配置从服务器:
CHANGE MASTER TO
MASTER_HOST='10.15.200.101',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1,
MASTER_USER='repl',
MASTER_PASSWORD='!@#qweASD69';
mysql> show slave status\G;
Slave_IO_Running: No
Slave_SQL_Running: No
mysql> start slave;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.15.200.101
Master_User: repl
Master_Port: 3306
重点是这里,得有
node03: 同理类似
[root@node03 ~]# rm -fr /var/lib/mysql/* && mysqld --initialize --user=mysql
[root@node03 ~]# systemctl start mysqld
[root@node03 ~]# grep 'temporary password' /var/log/mysqld.log | awk '{print $11}'
>uLMJdwI/1Tl
[root@node03 ~]# mysql -uroot -p'>uLMJdwI/1Tl'
ALTER USER 'root'@'localhost' IDENTIFIED BY '!@#qweASD69';
grant replication slave on *.* to repl@'10.15.200.%' identified by '!@#qweASD69';
grant all on *.* to mha@'10.15.200.%' identified by '!@#qweASD69';
flush privileges;
node03上配置从服务器:
CHANGE MASTER TO
MASTER_HOST='10.15.200.101',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1,
MASTER_USER='repl',
MASTER_PASSWORD='!@#qweASD69';
mysql> show slave status\G;
Slave_IO_Running: No
Slave_SQL_Running: No
mysql> start slave;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.15.200.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
# ping虚拟ip 无法ping通
~$ ping 10.15.200.118
PING 10.15.200.118 (10.15.200.118): 56 data bytes
Request timeout for icmp_seq 0
node01: 先在主库master上绑定VIP(只需手工绑定一次,后续脚本会自动切换)
[root@node01 ~]# ifconfig ens33:1 10.15.200.118/24 # 注意如果是ens32做对应的修改
[root@node01 ~]# ip addr
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet 10.15.200.118/24 brd 10.15.200.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
~$ ping 10.15.200.118 -c2
PING 10.15.200.118 (10.15.200.118): 56 data bytes
64 bytes from 10.15.200.118: icmp_seq=0 ttl=64 time=1.431 ms
# node05安装管理节点
yum install mha4mysql-node-0.58 mha4mysql-manager-0.58 -y
mkdir -p /etc/mha/scripts
wget http://10.15.200.8/files/mha/master_ip_failover -O /etc/mha/scripts/master_ip_failover
wget http://10.15.200.8/files/mha/master_ip_online_change -O /etc/mha/scripts/master_ip_online_change
wget http://10.15.200.8/files/mha/send_report -O /etc/mha/scripts/send_report
chmod 744 /etc/mha/scripts/*
wget http://10.15.200.8/files/mha/app.cnf -O /etc/mha/app.cnf
wget http://10.15.200.8/files/mha/masterha_default.cnf -O /etc/masterha_default.cnf
node05:
/etc/mha/scripts/master_ip_failover # 修改网址的名字 ens32 ens34 ens33
/etc/mha/scripts/master_ip_online_change # 修改网址的名字 ens32 ens34 ens33
node05赋予脚本执行权限:
[root@node05 ~]# chmod 744 /etc/mha/scripts/*
# 用 masterha_check_ssh 命令检查 ssh 互信是否成功
[root@node05 ~]# masterha_check_ssh --conf=/etc/mha/app.cnf
Tue Mar 2 23:08:01 2021 - [info] All SSH connection tests passed successfully.
# 使用 masterha_check_repl 命令检查 mysql 主从是否正常
[root@node05 ~]# masterha_check_repl --conf=/etc/mha/app.cnf
MySQL Replication Health is OK.
启动MHA: 如下命令 会卡住不动
[root@node05 ~]# masterha_manager --conf=/etc/mha/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app/manager.log 2>&1
通过日志检查MHA是否启动成功:
[root@node05 ~]# tailf /var/log/mha/app/manager.log
Checking the Status of the script.. OK
Tue Mar 2 23:15:30 2021 - [info] OK.
Tue Mar 2 23:15:30 2021 - [warning] shutdown_script is not defined.
Tue Mar 2 23:15:30 2021 - [info] Set master ping interval 1 seconds.
Tue Mar 2 23:15:30 2021 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s node01 -s node02 -s node03
Tue Mar 2 23:15:30 2021 - [info] Starting ping health check on node01(10.15.200.101:3306)..
Tue Mar 2 23:15:30 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
# 最后一行出现如下字样表明启动成功
Tue Mar 2 23:15:30 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Mon Sep 20 23:43:27 2021 - [info] Got exit code 1 (Not master dead).
检查MHA集群状态:
[root@node05 ~]# masterha_check_status --conf=/etc/mha/app.cnf
app (pid:3412) is running(0:PING_OK), master:node01
将node01的mysql服务停止:
[root@node01 ~]# systemctl stop mysqld
node05动态的查看日志:
[root@node05 scripts]# tailf /var/log/mha/app/manager.log
IN SCRIPT TEST====/sbin/ifconfig ens33:1 down==/sbin/ifconfig ens33:1 10.15.200.118/24===
Enabling the VIP - 10.15.200.118/24 on the new master - node02
Warning: Permanently added 'node02,10.15.200.102' (ECDSA) to the list of known hosts.
Tue Mar 2 23:21:56 2021 - [info] OK.
Tue Mar 2 23:21:56 2021 - [info] ** Finished master recovery successfully.
Tue Mar 2 23:21:56 2021 - [info] * Phase 3: Master Recovery Phase completed.
Tue Mar 2 23:21:56 2021 - [info]
Tue Mar 2 23:21:56 2021 - [info] * Phase 4: Slaves Recovery Phase..
Tue Mar 2 23:21:56 2021 - [info]
Tue Mar 2 23:21:56 2021 - [info]
Tue Mar 2 23:21:56 2021 - [info] * Phase 4.1: Starting Slaves in parallel..
Tue Mar 2 23:21:56 2021 - [info]
Tue Mar 2 23:21:56 2021 - [info] -- Slave recovery on host node03(10.15.200.103:3306) started, pid: 3884. Check tmp log /var/log/mha/app/node03_3306_20210302232154.log if it takes time..
Tue Mar 2 23:21:58 2021 - [info]
Tue Mar 2 23:21:58 2021 - [info] Log messages from node03 ...
Tue Mar 2 23:21:58 2021 - [info]
Tue Mar 2 23:21:56 2021 - [info] Resetting slave node03(10.15.200.103:3306) and starting replication from the new master node02(10.15.200.102:3306)..
Tue Mar 2 23:21:56 2021 - [info] Executed CHANGE MASTER.
Tue Mar 2 23:21:57 2021 - [info] Slave started.
Tue Mar 2 23:21:57 2021 - [info] gtid_wait(bfb0229c-7b5b-11eb-9393-000c29edf7ca:1-6,
eec2d10b-7b5e-11eb-afcc-000c2930884c:1-3) completed on node03(10.15.200.103:3306). Executed 0 events.
Tue Mar 2 23:21:58 2021 - [info] End of log messages from node03.
Tue Mar 2 23:21:58 2021 - [info] -- Slave on host node03(10.15.200.103:3306) started.
Tue Mar 2 23:21:58 2021 - [info] All new slave servers recovered successfully.
Tue Mar 2 23:21:58 2021 - [info]
Tue Mar 2 23:21:58 2021 - [info] * Phase 5: New master cleanup phase..
Tue Mar 2 23:21:58 2021 - [info]
Tue Mar 2 23:21:58 2021 - [info] Resetting slave info on the new master..
Tue Mar 2 23:21:58 2021 - [info] node02: Resetting slave info succeeded.
Tue Mar 2 23:21:58 2021 - [info] Master failover to node02(10.15.200.102:3306) completed successfully.
Tue Mar 2 23:21:58 2021 - [info] Deleted server1 entry from /etc/mha/app.cnf .
Tue Mar 2 23:21:58 2021 - [info]
----- Failover Report -----
app: MySQL Master failover node01(10.15.200.101:3306) to node02(10.15.200.102:3306) succeeded
Master node01(10.15.200.101:3306) is down!
Check MHA Manager logs at node05.example.cn:/var/log/mha/app/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on node01(10.15.200.101:3306)
Selected node02(10.15.200.102:3306) as a new master.
node02(10.15.200.102:3306): OK: Applying all logs succeeded.
node02(10.15.200.102:3306): OK: Activated master IP address.
node03(10.15.200.103:3306): OK: Slave started, replicating from node02(10.15.200.102:3306)
node02(10.15.200.102:3306): Resetting slave info succeeded.
Master failover to node02(10.15.200.102:3306) completed successfully.
# 虚拟IP已经漂移到node02 node02由从的角色 提升为主的角色
[root@node02 mysql]# ip addr
inet 10.15.200.118/24 brd 10.15.200.255 scope global secondary ens33:1
valid_lft forever preferred_lft forever
# 此时发现 node03的主服务器由node01变为node02
[root@node03 ~]# mysql -uroot -p'!@#qweASD' -e 'show slave status\G;'
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.15.200.102
此时再次查看集群的状态 为失效的状态:
[root@node05 ~]# masterha_check_status --conf=/etc/mha/app.cnf
app is stopped(2:NOT_RUNNING).
[root@node05 ~]# cat /etc/mha/app.cnf # 此文件中的[server1]段内容已经被删除
注意事项:
1) master端同样要开启两个重要的选项,server-id和log-bin
并且server-id在全局架构中并且唯一,不能被其它主机使用
slave端要开启relay-log
https://github.com/yoshinorim/mha4mysql-manager/wiki
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement
更多推荐
所有评论(0)