reshard是进行redis集群管理的一个重要手段。常用于集群节点的伸缩。当你在进行reshard的时候,集群突然由于某种原因挂掉了,reshard过程戛然而止。这时你有没有感觉到慌张呢?现在我们一起来看看如何进行恢复。本例中假设在reshard过程中一台redis节点挂掉了(本例通过停止节点容器来模拟)。

检查节点情况

使用cluster check命令检查节点情况:

$ redis-cli --cluster reshard 192.168.1.196:6379 -a xxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at 192.168.1.196:6679: Connection refused
>>> Performing Cluster Check (using node 192.168.1.196:6379)
M: d1851a905c4daf870dc9ca7c28c3ebe29585af07 192.168.1.196:6379
   slots:[0-3276] (3277 slots) master
M: bfc47617d26387e800c1ff1b714d25175caaaa60 192.168.1.196:6579
   slots:[6554-9982] (3429 slots) master
M: 74c8b8101bc810e82d98926c9d0f60b8f1b5d163 192.168.1.196:6479
   slots:[3277-6553] (3277 slots) master
M: d93bf1672e1d83180bb156740592129e957f7289 192.168.1.196:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 192.168.1.196:6579 has slots in importing state 9983.
[WARNING] The following slots are open: 9983.
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.

可以看到2个问题,

1. slot 9983处于开放状态。

2. slots覆盖不完全。

首先,确保所有节点已启动

由于刚才是被我认为关掉了一个master节点,所以只要把那个节点启动回来就行了。

启动之后再使用cluster check命令:

$ redis-cli --cluster check 192.168.1.196:6379 -a xxxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.196:6379 (d1851a90...) -> 19083 keys | 3277 slots | 0 slaves.
192.168.1.196:6579 (bfc47617...) -> 19732 keys | 3429 slots | 0 slaves.
192.168.1.196:6479 (74c8b810...) -> 19184 keys | 3277 slots | 0 slaves.
192.168.1.196:6679 (61022de0...) -> 18095 keys | 3124 slots | 0 slaves.
192.168.1.196:6779 (d93bf167...) -> 19080 keys | 3277 slots | 0 slaves.
[OK] 95174 keys in 5 masters.
5.81 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.196:6379)
M: d1851a905c4daf870dc9ca7c28c3ebe29585af07 192.168.1.196:6379
   slots:[0-3276] (3277 slots) master
M: bfc47617d26387e800c1ff1b714d25175caaaa60 192.168.1.196:6579
   slots:[6554-9982] (3429 slots) master
M: 74c8b8101bc810e82d98926c9d0f60b8f1b5d163 192.168.1.196:6479
   slots:[3277-6553] (3277 slots) master
M: 61022de0e08f419c571c892188ac1e689d9528e6 192.168.1.196:6679
   slots:[9983-13106] (3124 slots) master
M: d93bf1672e1d83180bb156740592129e957f7289 192.168.1.196:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 192.168.1.196:6579 has slots in importing state 9983.
[WARNING] Node 192.168.1.196:6679 has slots in migrating state 9983.
[WARNING] The following slots are open: 9983.
>>> Check slots coverage...
[OK] All 16384 slots covered.

发现slot覆盖问题已经解决了。但是slot9983 open问题还在。

然后,使用cluster fix命令恢复

$ redis-cli --cluster fix 192.168.1.196:6379 -a xxxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.196:6379 (d1851a90...) -> 19083 keys | 3277 slots | 0 slaves.
192.168.1.196:6579 (bfc47617...) -> 19732 keys | 3429 slots | 0 slaves.
192.168.1.196:6479 (74c8b810...) -> 19184 keys | 3277 slots | 0 slaves.
192.168.1.196:6679 (61022de0...) -> 18095 keys | 3124 slots | 0 slaves.
192.168.1.196:6779 (d93bf167...) -> 19080 keys | 3277 slots | 0 slaves.
[OK] 95174 keys in 5 masters.
5.81 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.196:6379)
M: d1851a905c4daf870dc9ca7c28c3ebe29585af07 192.168.1.196:6379
   slots:[0-3276] (3277 slots) master
M: bfc47617d26387e800c1ff1b714d25175caaaa60 192.168.1.196:6579
   slots:[6554-9982] (3429 slots) master
M: 74c8b8101bc810e82d98926c9d0f60b8f1b5d163 192.168.1.196:6479
   slots:[3277-6553] (3277 slots) master
M: 61022de0e08f419c571c892188ac1e689d9528e6 192.168.1.196:6679
   slots:[9983-13106] (3124 slots) master
M: d93bf1672e1d83180bb156740592129e957f7289 192.168.1.196:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
[WARNING] Node 192.168.1.196:6579 has slots in importing state 9983.
[WARNING] Node 192.168.1.196:6679 has slots in migrating state 9983.
[WARNING] The following slots are open: 9983.
>>> Fixing open slot 9983
*** Found keys about slot 9983 in non-owner node 192.168.1.196:6579!
Set as migrating in: 192.168.1.196:6679
Set as importing in: 192.168.1.196:6579
>>> Nobody claims ownership, selecting an owner...
*** Configuring 192.168.1.196:6579 as the slot owner
>>> Case 2: Moving all the 9983 slot keys to its owner 192.168.1.196:6579
Moving slot 9983 from 192.168.1.196:6679 to 192.168.1.196:6579:
>>> Setting 9983 as STABLE in 192.168.1.196:6679
>>> Check slots coverage...
[OK] All 16384 slots covered.

命令参数说明:

参数说明
--cluster集群操作命令
fixfix子命令
192.168.1.196:6379集群中任意一个节点的ip地址和端口
-a xxxxxx        该节点的登录密码

这个命令会先执行一次check操作,然后自动对开放的slot进行修复。

检查

$ redis-cli --cluster check 192.168.1.196:6379 -a xxxxxx
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.196:6379 (d1851a90...) -> 19083 keys | 3277 slots | 0 slaves.
192.168.1.196:6579 (bfc47617...) -> 19732 keys | 3430 slots | 0 slaves.
192.168.1.196:6479 (74c8b810...) -> 19184 keys | 3277 slots | 0 slaves.
192.168.1.196:6679 (61022de0...) -> 18095 keys | 3123 slots | 0 slaves.
192.168.1.196:6779 (d93bf167...) -> 19080 keys | 3277 slots | 0 slaves.
[OK] 95174 keys in 5 masters.
5.81 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.196:6379)
M: d1851a905c4daf870dc9ca7c28c3ebe29585af07 192.168.1.196:6379
   slots:[0-3276] (3277 slots) master
M: bfc47617d26387e800c1ff1b714d25175caaaa60 192.168.1.196:6579
   slots:[6554-9983] (3430 slots) master
M: 74c8b8101bc810e82d98926c9d0f60b8f1b5d163 192.168.1.196:6479
   slots:[3277-6553] (3277 slots) master
M: 61022de0e08f419c571c892188ac1e689d9528e6 192.168.1.196:6679
   slots:[9984-13106] (3123 slots) master
M: d93bf1672e1d83180bb156740592129e957f7289 192.168.1.196:6779
   slots:[13107-16383] (3277 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

完成!

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐