生产环境出现实例宕机的情况,在redis的日志中会输出宕机时的相应信息,如下面的源代码,下面记录一下分析日志的详细过程。

=== REDIS BUG REPORT START: Cut & paste starting from here ===
10430:M 13 Jul 15:01:55.933 # Redis 3.2.3 crashed by signal: 11
10430:M 13 Jul 15:01:55.933 # Crashed running the instuction at: 0x438ea0
10430:M 13 Jul 15:01:55.933 # Accessing address: (nil)
10430:M 13 Jul 15:01:55.933 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCallback+0xb0)[0x438ea0]

Backtrace:
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](logStackTrace+0x29)[0x45d3d9]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](sigsegvHandler+0xaa)[0x45d8ca]
/lib64/libpthread.so.0(+0xf630)[0x7f82f8e7d630]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCallback+0xb0)[0x438ea0]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](dictScan+0x198)[0x423a28]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanGenericCommand+0x47e)[0x43a8de]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCommand+0x2e)[0x43ab2e]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](call+0x85)[0x426a85]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](processCommand+0x367)[0x429b27]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](processInputBuffer+0x105)[0x4364c5]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](aeProcessEvents+0x218)[0x4211f8]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](aeMain+0x2b)[0x4214ab]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](main+0x410)[0x41e4a0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f82f8ac2555]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster][0x41e715]

------ INFO OUTPUT ------
# Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:7dd5c880ff29d49d
redis_mode:cluster
os:Linux 3.10.0-957.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.5
process_id:10430
run_id:2bf9299191fcd9be9be89d2bb41f83d349df78a5
tcp_port:7000
uptime_in_seconds:41733606
uptime_in_days:483
hz:10
lru_clock:13528547
executable:/opt/cachecloud/redis/src/redis-server
config_file:/opt/cachecloud/conf/redis-cluster-7000.conf

# Clients
connected_clients:403
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:1541593192
used_memory_human:1.44G
used_memory_rss:1568382976
used_memory_rss_human:1.46G
used_memory_peak:1551791368
used_memory_peak_human:1.45G
total_system_memory:134898122752
total_system_memory_human:125.63G
used_memory_lua:43008
used_memory_lua_human:42.00K
maxmemory:2147483648
maxmemory_human:2.00G
maxmemory_policy:volatile-lru
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-4.0.3

……省略部分……

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=2194979,expires=37202,avg_ttl=1132024957
hash_init_value: 1616026910

------ CLIENT LIST OUTPUT ------
id=89349749 addr=192.168.1.5:21080 fd=155 name= age=314 idle=14 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
id=89346248 addr=192.168.1.55:10196 fd=621 name= age=3356 idle=15 flags=r db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
id=89346249 addr=192.168.1.55:10203 fd=622 name= age=3356 idle=15 flags=r db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
id=89350085 addr=192.168.1.4:65190 fd=86 name= age=10 idle=2 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=hget
…………省略…………

------ CURRENT CLIENT INFO ------
id=89350080 addr=10.106.49.158:41384 fd=169 name= age=15 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32770 obl=0 oll=0 omem=0 events=r cmd=scan
argv[0]: 'SCAN'
argv[1]: '706181'
argv[2]: 'match'
argv[3]: '*'
argv[4]: 'count'
argv[5]: '600'

------ REGISTERS ------
10430:M 13 Jul 15:01:55.935 # 
RAX:0000000000000000 RBX:00047f82e7340e68
RCX:00007f8295938fa8 RDX:00000000000000c6
RDI:00007fff273c2660 RSI:00047f82e7340e68
RBP:00007f82a45061e0 RSP:00007fff273c2580
R8 :0000000000000002 R9 :0000000000000020
R10:0000000000000018 R11:0000000000000022
R12:0000000000189685 R13:00047f82e7340e68
R14:00007f829767b1c0 R15:00000000003fffff
RIP:0000000000438ea0 EFL:0000000000010246
CSGSFS:0000000000000033
10430:M 13 Jul 15:01:55.935 # (00007fff273c258f) -> 00007f829767b1c0
10430:M 13 Jul 15:01:55.935 # (00007fff273c258e) -> 00000000000015fe
10430:M 13 Jul 15:01:55.935 # (00007fff273c258d) -> 00007f82f20181e0
10430:M 13 Jul 15:01:55.935 # (00007fff273c258c) -> 00007f829593ac23
10430:M 13 Jul 15:01:55.935 # (00007fff273c258b) -> 00007fff273c2660
10430:M 13 Jul 15:01:55.935 # (00007fff273c258a) -> 00007fff273c2650
10430:M 13 Jul 15:01:55.935 # (00007fff273c2589) -> 0000000000000258
10430:M 13 Jul 15:01:55.935 # (00007fff273c2588) -> 0005e3aa5c4b3c4b
10430:M 13 Jul 15:01:55.935 # (00007fff273c2587) -> ffffffffffc00000
10430:M 13 Jul 15:01:55.935 # (00007fff273c2586) -> 0000000000000000
10430:M 13 Jul 15:01:55.935 # (00007fff273c2585) -> 0000000000423a28
10430:M 13 Jul 15:01:55.935 # (00007fff273c2584) -> 0000000000189685
10430:M 13 Jul 15:01:55.935 # (00007fff273c2583) -> 00007fff273c2660
10430:M 13 Jul 15:01:55.935 # (00007fff273c2582) -> 0000000000438df0
10430:M 13 Jul 15:01:55.935 # (00007fff273c2581) -> 000000000041e926
10430:M 13 Jul 15:01:55.935 # (00007fff273c2580) -> 00007f8295904080

------ FAST MEMORY TEST ------
10430:M 13 Jul 15:01:55.935 # Bio thread for job type #0 terminated
10430:M 13 Jul 15:01:55.936 # Bio thread for job type #1 terminated
*** Preparing to test memory region 722000 (94208 bytes)
*** Preparing to test memory region 13d2000 (135168 bytes)
*** Preparing to test memory region 7f8291000000 (1608515584 bytes)
*** Preparing to test memory region 7f82f0fff000 (8388608 bytes)
*** Preparing to test memory region 7f82f1800000 (10485760 bytes)
*** Preparing to test memory region 7f82f8800000 (2097152 bytes)
*** Preparing to test memory region 7f82f8e69000 (20480 bytes)
*** Preparing to test memory region 7f82f9086000 (16384 bytes)
*** Preparing to test memory region 7f82f97a5000 (16384 bytes)
*** Preparing to test memory region 7f82f97ae000 (12288 bytes)
*** Preparing to test memory region 7f82f97b3000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/antirez/redis/issues

  Suspect RAM error? Use redis-server --test-memory to verify it.

宕机日志分以下几部分:

  • Crashed running the instuction at: 0x438ea0:宕机代码地址
  • STACK TRACE:宕机调用链栈信息
  • INFO OUTPUT: 宕机时INFO信息输出
  • CLIENT LIST OUTPUT: 宕机时客户端连接列表输出
  • CURRENT CLIENT INFO:宕机引发客户端信息输出,包含连接信息及触发时调用的命令。
  • REGISTERS:宕机时寄存器信息输出
  • FAST MEMORY TEST: 快速内存检测信息

1.STACK TRACE

------ STACK TRACE ------
EIP:
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCallback+0xb0)[0x438ea0]

Backtrace:
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](logStackTrace+0x29)[0x45d3d9]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](sigsegvHandler+0xaa)[0x45d8ca]
/lib64/libpthread.so.0(+0xf630)[0x7f82f8e7d630]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCallback+0xb0)[0x438ea0]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](dictScan+0x198)[0x423a28]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanGenericCommand+0x47e)[0x43a8de]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](scanCommand+0x2e)[0x43ab2e]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](call+0x85)[0x426a85]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](processCommand+0x367)[0x429b27]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](processInputBuffer+0x105)[0x4364c5]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](aeProcessEvents+0x218)[0x4211f8]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](aeMain+0x2b)[0x4214ab]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster](main+0x410)[0x41e4a0]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f82f8ac2555]
/opt/cachecloud/redis/src/redis-server 0.0.0.0:7000 [cluster][0x41e715]

这部分的信息输出的是宕机发生时的调用链栈信息,可以通过 addr2line命令,获取到宕机代码的文件名和行号。

logs]$ addr2line -e /app/cachecloud/redis/src/redis-server 0x423a28
/app/redis-3.2.3/src/dict.c:871

如上,可以看到,转换后的文件名和行号为 dict.c的第871行。
然后可以到原文件中查看相应行数的代码。

本次分析主要用到的是Crashed running the instuction atSTACK TRACEINFO OUTPUTCLIENT LIST OUTPUTCURRENT CLIENT INFO这些信息综合分析宕机原因。其他的两个REGISTERS不太熟悉,FAST MEMORY TEST快速内存检测信息都正常,所以没用上。

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐