本文转载自微信公众号「相遇Linux」,作者JeffXie 。转载本文请联系相遇Linux公众号。
最近碰到客户反馈一个问题,系统hang了,ssh登录不上,但是可以ping通,通过串口登录进去之后,敲有些命令会卡住,查看cpu负载内存都很正常,手动触发kdump之后看不出死锁/softlockup/hardlockup等异常状态.而且重要的是已经影响到客户几十亿的业务了。
我能拿到的只有客户提供的vmcore,为了生活而奔波的人儿赶紧分析起来:
系统(SUSE 11SP1)
#crash ./vmcore ./vmlinux-2.6.32.59-0.19-default
./vmlinux-2.6.32.59-0.19-default.gz
crash>foreach IN bt > inbt
任意找到一个IN(可中断睡眠)的进程 12813 查看堆栈:
- crash> bt 12813
- PID:12813 TASK: ffff880262eda140 CPU: 8 COMMAND: "sshd"
- #0 [ffff880f98e1ba58] schedule atffffffff8139d2a4
- #1 [ffff880f98e1bb10] schedule_timeout atffffffff8139d935
- #2 [ffff880f98e1bba0] unix_wait_for_peer atffffffff81376e27
- #3 [ffff880f98e1bc00] unix_dgram_sendmsg atffffffff813785c2
- #4 [ffff880f98e1bcb0] sock_sendmsg atffffffff812e8ddc
- #5 [ffff880f98e1be50] sys_sendto atffffffff812e9318
- #6 [ffff880f98e1bf80] system_call_fastpath atffffffff81002f7b
汇编unix_dgram_sendmsg()
- crash>dis unix_dgram_sendmsg
- ...
- 2410xffffffff813785b7 <unix_dgram_sendmsg+1143>: mov %r13,%rsi
- 2420xffffffff813785ba <unix_dgram_sendmsg+1146>: mov %rbp,%rdi (key1)
- rdi代表unix_wait_for_peer函数的第一个参数
- 2430xffffffff813785bd <unix_dgram_sendmsg+1149>: callq 0xffffffff81376da0 <unix_wait_for_peer>
进程12813 unix_wait_for_peer() 函数的堆栈数据:
- crash>rd -SS ffff880f98e1bba0 -e ffff880f98e1bc00
- ffff880f98e1bba0: unix_wait_for_peer+135 0000000000000001
- ffff880f98e1bbb0: [ffff880262eda140:task_struct]autoremove_wake_function
- ffff880f98e1bbc0: ffff88021ea8bbc0 ffff880238315bc0
- ffff880f98e1bbd0: memcpy_fromiovec+87 [ffff880238514980:UNIX]
- ffff880f98e1bbe0: [ffff880238514c30:UNIX] [ffff880238514980:UNIX]
- ffff880f98e1bbf0: [ffff880ff708b630:UNIX] 7fffffffffffffff
汇编unix_wait_for_peer()
- crash>dis unix_wait_for_peer
- 0xffffffff81376da0<unix_wait_for_peer>: sub $0x58,%rsp
- 0xffffffff81376da4<unix_wait_for_peer+4>: mov $0x1,%edx
- 0xffffffff81376da9<unix_wait_for_peer+9>: mov %r13,0x50(%rsp)
- 0xffffffff81376dae<unix_wait_for_peer+14>: lea 0x2b8(%rdi),%r13
- 0xffffffff81376db5<unix_wait_for_peer+21>: mov %rbx,0x38(%rsp)
- 0xffffffff81376dba<unix_wait_for_peer+26>: mov %gs:0xa580,%rax
- 0xffffffff81376dc3<unix_wait_for_peer+35>: mov %rax,0x8(%rsp)
- 0xffffffff81376dc8<unix_wait_for_peer+40>: lea 0x18(%rsp),%rax
- 0xffffffff81376dcd<unix_wait_for_peer+45>: mov %rbp,0x40(%rsp) (key2)
rbp压入堆栈(0x58-0x40)处,正好是unix_wait_for_peer堆栈中ffff880f98e1bbe8处,即( [ffff880238514980:UNIX])
- 0xffffffff81376dd2<unix_wait_for_peer+50>: mov %rdi,%rbx
- 0xffffffff81376dd5<unix_wait_for_peer+53>: mov %rsi,%rbp
- 0xffffffff81376dd8<unix_wait_for_peer+56>: mov %r13,%rdi
查看unix_wait_for_peer()函数源码:
- 1005static long unix_wait_for_peer(struct sock *other, long timeo) (key3)
- 1006{
- 1007 struct unix_sock *u = unix_sk(other);
- 1008 int sched;
- 1009 DEFINE_WAIT(wait);
- 1011 prepare_to_wait_exclusive(&u->peer_wait, &wait,TASK_INTERRUPTIBLE);
- ..
- 1019 if (sched)
- 1020 timeo =schedule_timeout(timeo);
- 1021
- 1022 finish_wait(&u->peer_wait,&wait);
- 1023 return timeo;
- 1024}
根据s(key1)(key2)(key3)可以知道12813进程在等待unix_sock(ffff880238514980)
- crash>struct unix_sock ffff880238514980 | grep peer_wait -A 10
- peer_wait = {
- lock = {
- raw_lock = {
- slock = 3369322707
- }
- },
- task_list = {
- next = 0xffff880173427bc0,
- prev = 0xffff8801877a7bc0
- }
- }
查询出这个unix_sock是系统中的/dev/log文件(由syslog-ng创建,系统中大量需要记录log的进程需要通过这个unix_sock与syslog-ng通信,可以参考mansyslog-ng 和/etc/syslog-ng/syslog-ng.conf 配置文件)
- crash>struct unix_sock ffff880238514980
- ...
- dentry= 0xffff880e7634b9c0,
- ...
- crash>struct dentry 0xffff880e7634b9c0
- ...
- name = 0xffff880e7634ba60 "log"
- ...
遍历这个unix_sock的所有等待队列,
说明有645个进程正在等待这个unix_sock(ffff880238514980/dev/log)
- crash>list __wait_queue.task_list -s__wait_queue.private -H 0xffff880173427bc0 | grep -i private | wc -l
- 645
所有等待unix_sock的进程重定向到一个wait_unix_sock_process文件
- crash>list __wait_queue.task_list -s__wait_queue.private -H 0xffff880173427bc0 | grep -i private > wait_unix_sock_process
查询所有的sshd进程
- crash>ps | grep sshd
- 7097 1 6 ffff880244cae5c0 IN 0.0 54088 1392 sshd
- 12813 7097 8 ffff880262eda140 IN 0.0 60148 2900 sshd
- 19055 7097 2 ffff880ff74c4200 IN 0.0 60148 2900 sshd
- 21642 7097 1 ffff880262fb4340 IN 0.0 60148 2904 sshd
- 21958 7097 1 ffff880ffab501c0 IN 0.0 60148 2900 sshd
- 22459 7097 6 ffff880ff1ed4100 IN 0.0 121052 10272 sshd
- 22476 22459 0 ffff88022d11a500 IN 0.0 121352 4552 sshd
- 23334 7097 2 ffff880174b04640 IN 0.0 60148 2900 sshd
sshd进程大部分都在wait_unix_sock_process文件中,所以ssh登录之后会卡住
查看syslog-ng进程堆栈
- crash>bt 15232
- PID:15232 TASK: ffff8801aba90680 CPU: 6 COMMAND: "syslog-ng"
- #0 [ffff88021ea8ba58] schedule atffffffff8139d2a4
- #1 [ffff88021ea8bb10] schedule_timeout atffffffff8139d935
- #2 [ffff88021ea8bba0] unix_wait_for_peer atffffffff81376e27
- #3 [ffff88021ea8bc00] unix_dgram_sendmsg atffffffff813785c2
- #4 [ffff88021ea8bcb0] sock_sendmsg atffffffff812e8ddc
- #5 [ffff88021ea8be50] sys_sendto atffffffff812e9318
- #6 [ffff88021ea8bf80] system_call_fastpath atffffffff81002f7b
查看 unix_wait_for_peer()堆栈数据:
- crash>rd -SS ffff88021ea8bba0 -e ffff88021ea8bc00
- ffff88021ea8bba0: unix_wait_for_peer+135 0000000000000001
- ffff88021ea8bbb0: [ffff8801aba90680:task_struct]autoremove_wake_function
- ffff88021ea8bbc0: ffff880fce8dfbc0 ffff880f98e1bbc0
- ffff88021ea8bbd0: memcpy_fromiovec+87 [ffff880238514980:UNIX]
- ffff88021ea8bbe0: [ffff880238514c30:UNIX][ffff880238514980:UNIX] (key4)
- ffff88021ea8bbf0: [ffff88104c1c3930:UNIX] 7fffffffffffffff
由(key4)可知syslog-ng也是在等待ffff880238514980 unix_sock
所以系统中大量的进程,包括sshd cron等其他的645个进程都在等待unix_sock(/dev/log)(/dev/log由syslog-ng创建),syslog-ng卡住在等这个unix_sock之后,系统中大量的进程都会卡住,包括sshd cron进程等. 但是syslog-ng为什么卡住从vmcore根本看不出来,很可能进程在用户层锁住了.
通过查看客户系统中最近安装的软件发现,他们最近安装了两个定制软件,卸载之后系统恢复正常。
- # /bin/rpm -qa --last
- CARKpam-11.01-1.19 Tue Jun 9 22:49:22 2020
- CARKaim-11.01-1.5 Tue Jun 9 22:38:48 2020