注1:无论是在db端还是在监控端如果有对配置文件进行修改操作都需要重启代理进程和监控进程。
注2:MMM启动顺序:先启动monitor,再启动 agent
检查集群状态:
[root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2) master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
如果服务器状态不是ONLINE,可以用如下命令将服务器上线,例如:
#mmm_controlset_online主机名
例如:[root@monitor1 ~]#mmm_controlset_onlinemaster1
从上面的显示可以看到,写请求的VIP在master1上,所有从节点也都把master1当做主节点。
查看是否启用vip
[root@master1 ~]# ipaddr show dev eno16777736 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000 link/ether 00:0c:29:6d:2f:82 brdff:ff:ff:ff:ff:ff inet 192.168.31.83/24 brd 192.168.31.255 scope global eno16777736 valid_lft forever preferred_lft forever inet 192.168.31.2/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe6d:2f82/64 scope link valid_lft forever preferred_lft forever [root@master2 ~]# ipaddr show dev eno16777736 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000 link/ether 00:0c:29:75:1a:9c brdff:ff:ff:ff:ff:ff inet 192.168.31.141/24 brd 192.168.31.255 scope global dynamic eno16777736 valid_lft 35850sec preferred_lft 35850sec inet 192.168.31.5/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe75:1a9c/64 scope link valid_lft forever preferred_lft forever [root@slave1 ~]# ipaddr show dev eno16777736 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000 link/ether 00:0c:29:02:21:19 brdff:ff:ff:ff:ff:ff inet 192.168.31.250/24 brd 192.168.31.255 scope global dynamic eno16777736 valid_lft 35719sec preferred_lft 35719sec inet 192.168.31.4/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe02:2119/64 scope link valid_lft forever preferred_lft forever [root@slave2 ~]# ipaddr show dev eno16777736 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000 link/ether 00:0c:29:e2:c7:fa brdff:ff:ff:ff:ff:ff inet 192.168.31.225/24 brd 192.168.31.255 scope global dynamic eno16777736 valid_lft 35930sec preferred_lft 35930sec inet 192.168.31.3/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fee2:c7fa/64 scope link valid_lft forever preferred_lft forever 在master2,slave1,slave2主机上查看主mysql的指向 mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.83 Master_User: rep Master_Port: 3306 Connect_Retry: 60
MMM高可用性测试:
服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。
首先查看整个集群的状态,可以看到整个集群状态正常
[root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2) master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
模拟master1宕机,手动停止mysql服务,观察monitor日志,master1的日志如下:
[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log 2017/01/09 22:02:55 WARN Check 'rep_threads' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:02:55 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:03:05 ERROR Check 'mysql' on 'master1' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:03:07 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2017/01/09 22:03:07 INFO Removing all roles from host 'master1': 2017/01/09 22:03:07 INFO Removed role 'writer(192.168.31.2)' from host 'master1' 2017/01/09 22:03:07 INFO Orphaned role 'writer(192.168.31.2)' has been assigned to 'master2'
查看群集的最新状态
[root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/HARD_OFFLINE. Roles: master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5), writer(192.168.31.2) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
从显示结果可以看出master1的状态有ONLINE转换为HARD_OFFLINE,写VIP转移到了master2主机上。
检查所有的db服务器群集状态
[root@monitor1 ~]# mmm_control checks all master1 ping [last change: 2017/01/09 21:31:47] OK master1 mysql [last change: 2017/01/09 22:03:07] ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) master1 rep_threads [last change: 2017/01/09 21:31:47] OK master1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null slave1 ping [last change: 2017/01/09 21:31:47] OK slave1mysql [last change: 2017/01/09 21:31:47] OK slave1 rep_threads [last change: 2017/01/09 21:31:47] OK slave1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null master2 ping [last change: 2017/01/09 21:31:47] OK master2 mysql [last change: 2017/01/09 21:57:32] OK master2 rep_threads [last change: 2017/01/09 21:31:47] OK master2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null slave2 ping [last change: 2017/01/09 21:31:47] OK slave2mysql [last change: 2017/01/09 21:31:47] OK slave2 rep_threads [last change: 2017/01/09 21:31:47] OK slave2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null
从上面可以看到master1能ping通,说明只是服务死掉了。
查看master2主机的ip地址:
[root@master2 ~]# ipaddr show dev eno16777736 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdiscpfifo_fast state UP qlen 1000 link/ether 00:0c:29:75:1a:9c brdff:ff:ff:ff:ff:ff inet 192.168.31.141/24 brd 192.168.31.255 scope global dynamic eno16777736 valid_lft 35519sec preferred_lft 35519sec inet 192.168.31.5/32 scope global eno16777736 valid_lft forever preferred_lft forever inet 192.168.31.2/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe75:1a9c/64 scope link valid_lft forever preferred_lft forever
slave1主机:
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.141 Master_User: rep Master_Port: 3306
slave2主机:
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.141 Master_User: rep Master_Port: 3306
启动master1主机的mysql服务,观察monitor日志,master1的日志如下:
[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log 2017/01/09 22:16:56 INFO Check 'mysql' on 'master1' is ok! 2017/01/09 22:16:56 INFO Check 'rep_backlog' on 'master1' is ok! 2017/01/09 22:16:56 INFO Check 'rep_threads' on 'master1' is ok! 2017/01/09 22:16:59 FATAL State of host 'master1' changed from HARD_OFFLINE to AWAITING_RECOVERY
从上面可以看到master1的状态由hard_offline改变为awaiting_recovery状态
用如下命令将服务器上线:
[root@monitor1 ~]#mmm_controlset_onlinemaster1
查看群集最新状态
[root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/ONLINE. Roles: master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5), writer(192.168.31.2) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
可以看到主库启动不会接管主,只到现有的主再次宕机。
总结
(1)master2备选主节点宕机不影响集群的状态,就是移除了master2备选节点的读状态。
(2)master1主节点宕机,由master2备选主节点接管写角色,slave1,slave2指向新master2主库进行复制,slave1,slave2会自动change master到master2.
(3)如果master1主库宕机,master2复制应用又落后于master1时就变成了主可写状态,这时的数据主无法保证一致性。
如果master2,slave1,slave2延迟于master1主,这个时master1宕机,slave1,slave2将会等待数据追上db1后,再重新指向新的主node2进行复制操作,这时的数据也无法保证同步的一致性。
(4)如果采用MMM高可用架构,主,主备选节点机器配置一样,而且开启半同步进一步提高安全性或采用MariaDB/mysql5.7进行多线程从复制,提高复制的性能。
附:
1、日志文件:
日志文件往往是分析错误的关键,所以要善于利用日志文件进行问题分析。
db端:/var/log/mysql-mmm/mmm_agentd.log
监控端:/var/log/mysql-mmm/mmm_mond.log
2、命令文件:
mmm_agentd:db代理进程的启动文件
mmm_mond:监控进程的启动文件
mmm_backup:备份文件
mmm_restore:还原文件
mmm_control:监控操作命令文件
db服务器端只有mmm_agentd程序,其它的都是在monitor服务器端。
3、mmm_control用法
mmm_control程序可以用于监控群集状态、切换writer、设置online\offline操作等。
Valid commands are:
help - show this message #帮助信息
ping - ping monitor #ping当前的群集是否正常
show - show status #群集在线状态检查
checks [<host>(北联网教程,专业提供视频软件下载)
……