原理
Linux内核发送崩溃时,kdump会生成一个内核转储文件vmcore。 可以通过分析vmcore分析出内核崩溃的原因。
crash是一个被广泛应用的内核崩溃转储文件分析工具。使用crash调试内核转储文件,需要安装crash工具和内核调试工具kernel-debuginfo。
安装需要的软件
1、查看系统内核
- [root@qd01-stop-free015 ~]# uname -r
- 3.10.0-1160.15.2.el7.x86_64
2、安装kdump,crash
- yum install crash kexec-tools -y
3、安装kernel-debuginfo
下载链接http://debuginfo.centos.org/7/x86_64/
- rpm -ivh kernel-debuginfo-3.10.0-1160.15.2.el7.x86_64.rpm kernel-debuginfo-common-x86_64-3.10.0-1160.15.2.el7.x86_64.rpm
crash报告分析
1、使用crash命令加载vmcore文件
- [root@qd01-stop-free015 kdump]# crash /usr/lib/debug/lib/modules/3.10.0-1160.15.2.el7.x86_64/vmlinux vmcore
- crash 7.2.3-11.el7_9.1
- Copyright (C) 2002-2017 Red Hat, Inc.
- Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
- Copyright (C) 1999-2006 Hewlett-Packard Co
- Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
- Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
- Copyright (C) 2005, 2011 NEC Corporation
- Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
- Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
- This program is free software, covered by the GNU General Public License,
- and you are welcome to change it and/or distribute copies of it under
- certain conditions. Enter "help copying" to see the conditions.
- This program has absolutely no warranty. Enter "help warranty" for details.
- GNU gdb (GDB) 7.6
- Copyright (C) 2013 Free Software Foundation, Inc.
- License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
- This is free software: you are free to change and redistribute it.
- There is NO WARRANTY, to the extent permitted by law. Type "show copying"
- and "show warranty" for details.
- This GDB was configured as "x86_64-unknown-linux-gnu"...
- WARNING: kernel relocated [274MB]: patching 87300 gdb minimal_symbol values
- KERNEL: /usr/lib/debug/lib/modules/3.10.0-1160.15.2.el7.x86_64/vmlinux
- DUMPFILE: vmcore [PARTIAL DUMP]
- CPUS: 8
- DATE: Thu Mar 4 10:12:38 2021
- UPTIME: 00:05:04
- LOAD AVERAGE: 5.28, 3.20, 1.38
- TASKS: 256
- NODENAME: zf-dbslave001
- RELEASE: 3.10.0-1160.15.2.el7.x86_64
- VERSION: #1 SMP Wed Feb 3 15:06:38 UTC 2021
- MACHINE: x86_64 (2500 Mhz)
- MEMORY: 63 GB
- PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000074"
- PID: 1362
- COMMAND: "AliYunDun"
- TASK: ffff90f972365280 [THREAD_INFO: ffff90f9767a4000]
- CPU: 5
- STATE: TASK_RUNNING (PANIC)
输出注释如下:
- KERNEL:系统崩溃时运行的 kernel 文件
- DUMPFILE: 内核转储文件
- CPUS: 所在机器的 CPU 数量
- DATE:系统崩溃的时间
- TASKS:系统崩溃时内存中的任务数
- NODENAME:崩溃的系统主机名
- RELEASE: 和 VERSION:内核版本号
- MACHINE:CPU 架构
- MEMORY:崩溃主机的物理内存
- PANIC:崩溃类型,常见的崩溃类型包括:
- SysRq (System Request):通过魔法组合键导致的系统崩溃,通常是测试使用。通过 echo c > /proc/sysrq-trigger,就可以触发系统崩溃。
- oops:可以看成是内核级的 Segmentation Fault。应用程序如果进行了非法内存访问或执行了非法指令,会得到 Segfault 信号,一般行为是 coredump,应用程序也可以自己截获 Segfault 信号,自行处理。如果内核自己犯了这样的错误,则会弹出 oops 信息。
从以上输出可以知道,本次系统崩溃的原因是:PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000074",然后导致AliYunDun把系统重启了。
PS:搞不懂阿里云的破逻辑,服务器被黑了居然只会不断重启服务器?
2、使用bt 命令用于查看系统崩溃前的堆栈信息。
- crash> bt
- PID: 1362 TASK: ffff90f972365280 CPU: 5 COMMAND: "AliYunDun"
- #0 [ffff90f9767a77a0] machine_kexec at ffffffff922662c4
- #1 [ffff90f9767a7800] __crash_kexec at ffffffff923227a2
- #2 [ffff90f9767a78d0] crash_kexec at ffffffff92322890
- #3 [ffff90f9767a78e8] oops_end at ffffffff9298c798
- #4 [ffff90f9767a7910] no_context at ffffffff92275d14
- #5 [ffff90f9767a7960] __bad_area_nosemaphore at ffffffff92275fe2
- #6 [ffff90f9767a79b0] bad_area_nosemaphore at ffffffff92276104
- #7 [ffff90f9767a79c0] __do_page_fault at ffffffff9298f750
- #8 [ffff90f9767a7a30] trace_do_page_fault at ffffffff9298fa26
- #9 [ffff90f9767a7a70] do_async_page_fault at ffffffff9298efa2
- #10 [ffff90f9767a7a90] async_page_fault at ffffffff9298b7a8
- #11 [ffff90f9767a7b98] kmem_cache_alloc_trace at ffffffff92428a0c
- #12 [ffff90f9767a7c98] mntput at ffffffff92471d94
- #13 [ffff90f9767a7d88] kvm_sched_clock_read at ffffffff9226d3be
- #14 [ffff90f9767a7ec8] putname at ffffffff9245fd3d
- #15 [ffff90f9767a7f50] system_call_fastpath at ffffffff92994f92
- RIP: 00007f84fd928315 RSP: 00007f84fb011af8 RFLAGS: 00000206
- RAX: 000000000000004e RBX: 000000000244e010 RCX: ffffffffffffffff
- RDX: 0000000000008000 RSI: 000000000244e010 RDI: 0000000000000012
- RBP: 000000000244e010 R8: 0000000000000020 R9: 0000000000008030
- R10: 0000000000000076 R11: 0000000000000246 R12: ffffffffffffff30
- R13: 0000000000000000 R14: 000000000244dfe0 R15: 000000000000052a
- ORIG_RAX: 000000000000004e CS: 0033 SS: 002b
3、log 命令可以打印系统消息缓冲区,从而可能找到系统崩溃的线索。输出太多,这里只截取部分信息。
- crash> log
- [ 0.000000] Initializing cgroup subsys cpuset
- [ 0.000000] Initializing cgroup subsys cpu
- [ 0.000000] Initializing cgroup subsys cpuacct
- [ 0.000000] Linux version 3.10.0-1160.15.2.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Feb 3 15:06:38 UTC 2021
- [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-1160.15.2.el7.x86_64 root=UUID=1114fe9e-2309-4580-b183-d778e6d97397 ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 idle=halt biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs
- [ 0.000000] e820: BIOS-provided physical RAM map:
- [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
- [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000013ffffff] usable
- [ 0.000000] BIOS-e820: [mem 0x0000000014000000-0x000000001511ffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x0000000015120000-0x00000000bffcdfff] usable
- [ 0.000000] BIOS-e820: [mem 0x00000000bffce000-0x00000000bfffffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
- [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000fffffffff] usable
- [ 0.000000] BIOS-e820: [mem 0x0000001000000000-0x000000103fffffff] reserved
- [ 0.000000] NX (Execute Disable) protection: active
- [ 0.000000] SMBIOS 2.8 present.
- [ 0.000000] DMI: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
- [ 0.000000] Hypervisor detected: KVM
- [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
- [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
- [ 0.000000] e820: last_pfn = 0x1000000 max_arch_pfn = 0x400000000
- [ 0.000000] MTRR default type: write-back
- [ 0.000000] MTRR fixed ranges enabled:
- [ 0.000000] 00000-9FFFF write-back
- [ 0.000000] A0000-BFFFF uncachable
- [ 0.000000] C0000-FFFFF write-protect
- [ 0.000000] MTRR variable ranges enabled:
- [ 0.000000] 0 base 0000C0000000 mask 3FFFC0000000 uncachable
- [ 0.000000] 1 disabled
- [ 0.000000] 2 disabled
- [ 0.000000] 3 disabled
- [ 0.000000] 4 disabled
- [ 0.000000] 5 disabled
- [ 0.000000] 6 disabled
- [ 0.000000] 7 disabled
- [ 0.000000] PAT configuration [0-7]: WB WC UC- UC WB WP UC- UC
- [ 0.000000] e820: last_pfn = 0xbffce max_arch_pfn = 0x400000000
- [ 0.000000] found SMP MP-table at [mem 0x000f5a00-0x000f5a0f] mapped at [ffffffffff200a00]
- [ 0.000000] Base memory trampoline at [ffff90f800099000] 99000 size 24576
- [ 0.000000] Using GB pages for direct mapping
- [ 0.000000] BRK [0x70e74000, 0x70e74fff] PGTABLE
- [ 0.000000] BRK [0x70e75000, 0x70e75fff] PGTABLE
- [ 0.000000] BRK [0x70e76000, 0x70e76fff] PGTABLE
- [ 0.000000] BRK [0x70e77000, 0x70e77fff] PGTABLE
- [ 0.000000] BRK [0x70e78000, 0x70e78fff] PGTABLE
- [ 0.000000] RAMDISK: [mem 0x3625c000-0x37125fff]
- [ 0.000000] Early table checksum verification disabled
- [ 0.000000] ACPI: RSDP 00000000000f59b0 00014 (v00 BOCHS )
- [ 0.000000] ACPI: RSDT 00000000bffe2185 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)
- [ 0.000000] ACPI: FACP 00000000bffe093e 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)
- [ 0.000000] ACPI: DSDT 00000000bffdfd80 00BBE (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001)
- [ 0.000000] ACPI: FACS 00000000bffdfd40 00040
- [ 0.000000] ACPI: SSDT 00000000bffe09b2 015FB (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)
- [ 0.000000] ACPI: APIC 00000000bffe1fad 000B0 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)
- [ 0.000000] ACPI: SRAT 00000000bffe205d 00128 (v01 BOCHS BXPCSRAT 00000001 BXPC 00000001)
- [ 4.722250] Adding 33554428k swap on /data/swapfile. Priority:-2 extents:24 across:35823612k FS
- [ 5.841211] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/input/input5
- [ 5.841325] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
- [ 13.615575] mzoneinfo: loading out-of-tree module taints kernel.
- [ 13.615611] mzoneinfo: module verification failed: signature and/or required key missing - tainting kernel
- [ 305.100071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
- [ 305.101048] IP: [<ffffffffc02d74c0>] 0xffffffffc02d74c0
- [ 305.101653] PGD 800000010d7ed067 PUD 176f9c067 PMD 0
- [ 305.102276] Oops: 0000 [#1] SMP
- [ 305.102675] Modules linked in: tcp_diag inet_diag cirrus ttm nfit drm_kms_helper libnvdimm syscopyarea ppdev sysfillrect intel_powerclamp sysimgblt fb_sys_fops drm iosf_mbi parport_pc crc32_pclmul virtio_balloon parport ghash_clmulni_intel aesni_intel lrw gf128mul drm_panel_orientation_quirks glue_helper pcspkr i2c_piix4 joydev ablk_helper cryptd ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel virtio_pci virtio_ring floppy serio_raw virtio
- [ 305.109021] CPU: 5 PID: 1362 Comm: AliYunDun Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.15.2.el7.x86_64 #1
- [ 305.110306] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS e623647 04/01/2014
- [ 305.111150] task: ffff90f972365280 ti: ffff90f9767a4000 task.ti: ffff90f9767a4000
- [ 305.111977] RIP: 0010:[<ffffffffc02d74c0>] [<ffffffffc02d74c0>] 0xffffffffc02d74c0
- [ 305.112843] RSP: 0018:ffff90f9767a7b48 EFLAGS: 00010283
- [ 305.113437] RAX: fffffffffffffbd0 RBX: 0000000000000240 RCX: 00000000000007cd
- [ 305.114228] RDX: 0000000000000000 RSI: ffff90f972365280 RDI: 00000000ffffffff
- [ 305.115014] RBP: ffff90f9767a7b88 R08: 0000000040000000 R09: 0000000000000400
- [ 305.115804] R10: 0000000000000000 R11: ffffd9d105c1ea00 R12: 0000000000000240
- [ 305.116586] R13: 0000000000000258 R14: 0000000000000018 R15: ffff90f9707aa000
- [ 305.117377] FS: 00007f84fb012700(0000) GS:ffff9107ffd40000(0000) knlGS:0000000000000000
- [ 305.118276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 305.118921] CR2: 0000000000000074 CR3: 000000017839e000 CR4: 00000000003606e0
- [ 305.119710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
- [ 305.120502] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
- [ 305.121291] Call Trace:
- [ 305.121581] [<ffffffff92428a0c>] ? kmem_cache_alloc_trace+0x3c/0x200
- [ 305.122304] [<ffffffff9242832e>] ? __kmalloc+0x2e/0x230
- [ 305.122898] [<ffffffff92471d94>] ? mntput+0x24/0x40
- [ 305.123458] [<ffffffff9226d3be>] ? kvm_sched_clock_read+0x1e/0x30
- [ 305.124162] [<ffffffff9245fd3d>] ? putname+0x3d/0x60
- [ 305.124733] [<ffffffff92994f92>] ? system_call_fastpath+0x25/0x2a
- [ 305.125417] Code: 65 48 8b 34 25 c0 0e 01 00 48 8b 96 30 04 00 00 48 8d 82 d0 fb ff ff 48 39 c6 74 2c 3b 7a 74 74 2a b9 d0 07 00 00 eb 0d 0f 1f 00 <3b> 7a 74 74 1b 83 e9 01 74 13 48 8b 90 30 04 00 00 48 8d 82 d0
- [ 305.128647] RIP [<ffffffffc02d74c0>] 0xffffffffc02d74c0
- [ 305.129263] RSP <ffff90f9767a7b48>
- [ 305.129660] CR2: 0000000000000074
4、ps 命令用于显示进程的状态,带 > 标识代表是活跃的进程。
- crash> ps
- PID PPID CPU TASK ST %MEM VSZ RSS COMM
- 0 0 0 ffffffff92e18480 RU 0.0 0 0 [swapper/0]
- > 0 0 1 ffff90f964f74200 RU 0.0 0 0 [swapper/1]
- 0 0 2 ffff90f964f75280 RU 0.0 0 0 [swapper/2]
- > 0 0 3 ffff90f964f76300 RU 0.0 0 0 [swapper/3]
- 0 0 4 ffff90f965760000 RU 0.0 0 0 [swapper/4]
- 0 0 5 ffff90f965761080 RU 0.0 0 0 [swapper/5]
- 0 0 6 ffff90f965762100 RU 0.0 0 0 [swapper/6]
- > 0 0 7 ffff90f965763180 RU 0.0 0 0 [swapper/7]
- 1 0 3 ffff90f964b60000 IN 0.0 43460 3816 systemd
- 这里忽略部分信息......
- 1045 1 6 ffff90f91038c200 IN 0.0 110208 880 agetty
- 1144 1 3 ffff90f966f6a100 IN 0.0 32544 4304 AliYunDunUpdate
- 1145 1 3 ffff90f90b3a3180 IN 0.0 32544 4304 AliYunDunUpdate
- 1146 1 1 ffff90f90b3a2100 IN 0.0 32544 4304 AliYunDunUpdate
- 1161 1 7 ffff90f978bcb180 IN 0.0 32544 4304 AliYunDunUpdate
- 1165 1 1 ffff90f910389080 IN 0.0 802872 11300 aliyun-service
- 1166 1 3 ffff90f978bcc200 IN 0.0 802872 11300 aliyun-service
- 1170 1 5 ffff90f978bc8000 IN 0.0 802872 11300 aliyun-service
- 1180 1 3 ffff90f90c3f2100 IN 0.0 802872 11300 aliyun-service
- 1188 1 5 ffff90f91038d280 IN 0.0 4936 2556 matchpathcond
- 1191 1 5 ffff90f91038b180 IN 0.0 328 208 postcated
- 1193 1 7 ffff90f977398000 IN 0.0 3304 184 telinited
- 1194 1193 5 ffff90f910388000 IN 0.0 3436 1244 telinited
- 1206 1 0 ffff90f966f68000 IN 0.0 5088 1676 devlinked
- 1209 1 1 ffff90f970b40000 IN 0.0 172 40 logrotated
- 1313 1 1 ffff90f90f7d4200 IN 0.0 574284 17500 gmain
- 1314 1 7 ffff90f90f7d2100 IN 0.0 574284 17500 tuned
- 1322 1 5 ffff90f9783b8000 IN 0.0 139536 22220 AliYunDun
- 1323 1 1 ffff90f9533eb180 IN 0.0 139536 22220 AliYunDun
- 1324 1 5 ffff90f9533ed280 IN 0.0 139536 22220 AliYunDun
- 1345 1 3 ffff90f91279d280 IN 0.0 574284 17500 tuned
- 1346 1 5 ffff90f91279e300 IN 0.0 574284 17500 tuned
- 1347 1 5 ffff90f90eb84200 IN 0.0 718240 7536 rs:main Q:Reg
- 1349 1 1 ffff90f91279b180 IN 0.0 139536 22220 AliYunDun
- 1350 1 1 ffff90f91279c200 IN 0.0 139536 22220 AliYunDun
- 1351 1 1 ffff90f90b3a5280 IN 0.0 139536 22220 AliYunDun
- 1352 1 4 ffff90f90b3a1080 IN 0.0 139536 22220 AliYunDun
- 1353 1 5 ffff90f90b3a6300 IN 0.0 139536 22220 AliYunDun
- 1354 1 5 ffff90f90b3a4200 IN 0.0 139536 22220 AliYunDun
- 1355 1 1 ffff90f90b3a0000 IN 0.0 139536 22220 AliYunDun
- 1357 1 7 ffff90f90b780000 IN 0.0 139536 22220 AliYunDun
- 1358 1 5 ffff90f90b781080 IN 0.0 139536 22220 AliYunDun
- 1359 1 3 ffff90f972361080 IN 0.0 139536 22220 AliYunDun
- 1360 1 3 ffff90f972364200 IN 0.0 139536 22220 AliYunDun
- 1361 1 7 ffff90f972366300 IN 0.0 139536 22220 AliYunDun
- > 1362 1 5 ffff90f972365280 RU 0.0 139536 22220 AliYunDun
- 1363 1 5 ffff90f97b76d280 IN 0.0 139536 22220 AliYunDun
- 1401 1 3 ffff90f97638d280 IN 0.0 139536 22220 AliYunDun
- 1402 1 1 ffff90f97638e300 IN 0.0 139536 22220 AliYunDun
- 1403 1 7 ffff90f97638b180 IN 0.0 139536 22220 AliYunDun
- 1404 1 7 ffff90f97b76b180 IN 0.0 139536 22220 AliYunDun
- 1405 1 5 ffff90f97b76c200 IN 0.0 139536 22220 AliYunDun
- 1406 1 5 ffff90f97b76e300 IN 0.0 139536 22220 AliYunDun
- 1483 1 5 ffff90f970b45280 IN 0.0 112936 4344 sshd
- 1570 1483 7 ffff90f90e386300 IN 0.0 157640 6308 sshd
- 2036 1 1 ffff90f975791080 IN 0.0 802872 11300 aliyun-service
- 2060 1570 1 ffff90f90c3f4200 IN 0.0 157640 2508 sshd
- 2066 2060 1 ffff90f90cf8d280 IN 0.0 115548 2084 bash
- 2963 1 5 ffff90f9767d3180 IN 0.0 328 264 postcated
- 2973 1 2 ffff90f9767cb180 IN 0.0 5084 1672 devlinked
- 2977 1 7 ffff90f9767d1080 IN 0.0 172 44 logrotated
- 3923 2066 7 ffff90f9783be300 IN 0.0 241360 4640 sudo
- 3924 3923 5 ffff90f975be0000 IN 0.0 191872 2360 su
- 3925 3924 1 ffff90f90eb86300 IN 0.0 115680 2160 bash
- 4507 1 1 ffff90f90c3f0000 IN 0.0 17816 2096 assist_daemon
- 4508 1 7 ffff90f90c3f3180 IN 0.0 17816 2096 Timer thread
- 4509 1 1 ffff90f90c3f6300 IN 0.0 17816 2096 assist_daemon
- 4510 1 1 ffff90f90c3f1080 IN 0.0 17816 2096 Timer thread
- 5820 1 7 ffff90f90eb83180 IN 0.0 328 208 postcated
- 5824 1 4 ffff90f9767cc200 IN 0.0 5084 1672 devlinked
- 5828 1 3 ffff90f975b83180 IN 0.0 172 40 logrotated
- 9989 1 5 ffff90f90df95280 IN 0.0 328 204 postcated
- 9993 1 6 ffff90f9767b4200 IN 0.0 5088 1676 devlinked
- 9997 1 3 ffff90f967b7e300 IN 0.0 172 40 logrotated
- 15502 1 2 ffff90f966f6b180 IN 0.0 328 208 postcated
- 15528 1 4 ffff90f9533ee300 IN 0.0 5084 1668 devlinked
- 15532 1 1 ffff90f9533e8000 IN 0.0 172 40 logrotated
- 22388 1 3 ffff90f90f7c5280 IN 0.0 328 208 postcated
- 22392 1 4 ffff90f975be3180 IN 0.0 5088 1676 devlinked
- 22396 1 5 ffff90f977399080 IN 0.0 172 40 logrotated
- 30647 1 5 ffff90f9767b6300 IN 0.0 328 208 postcated
- 30651 1 6 ffff90f975b81080 IN 0.0 5092 1676 devlinked
- 30655 1 5 ffff90f975b85280 IN 0.0 172 40 logrotated
- 30779 1 3 ffff90f9757b8000 IN 0.0 2442608 3784 mountinfo
- 30780 1 2 ffff90f975b86300 IN 0.0 2442608 3784 mountinfo
- 30781 1 4 ffff90f975b82100 IN 0.0 2442608 3784 mountinfo
- 30783 1 7 ffff90f975b84200 IN 0.0 2442608 3784 mountinfo
- 30784 1 1 ffff90f90ebc1080 IN 0.0 2442608 3784 mountinfo
- 30785 1 1 ffff90f8bb941080 IN 0.0 2442608 3784 mountinfo
- > 31745 1 0 ffff90f90f7d3180 RU 0.0 2442608 3784 mountinfo
- > 31746 1 2 ffff90f90f7d6300 RU 0.0 2442608 3784 mountinfo
- > 31747 1 4 ffff90f90f7d0000 RU 0.0 2442608 3784 mountinfo
- > 31748 1 6 ffff90f97b76a100 RU 0.0 2442608 3784 mountinfo
从输出看出,mountinfo明显是异常进程,是导致本次系统重启的罪魁祸首
5、这里再次bt 命令来看一下堆栈
- crash> bt
- PID: 1362 TASK: ffff90f972365280 CPU: 5 COMMAND: "AliYunDun"
- #0 [ffff90f9767a77a0] machine_kexec at ffffffff922662c4
- #1 [ffff90f9767a7800] __crash_kexec at ffffffff923227a2
- #2 [ffff90f9767a78d0] crash_kexec at ffffffff92322890
- #3 [ffff90f9767a78e8] oops_end at ffffffff9298c798
- #4 [ffff90f9767a7910] no_context at ffffffff92275d14
- #5 [ffff90f9767a7960] __bad_area_nosemaphore at ffffffff92275fe2
- #6 [ffff90f9767a79b0] bad_area_nosemaphore at ffffffff92276104
- #7 [ffff90f9767a79c0] __do_page_fault at ffffffff9298f750
- #8 [ffff90f9767a7a30] trace_do_page_fault at ffffffff9298fa26
- #9 [ffff90f9767a7a70] do_async_page_fault at ffffffff9298efa2
- #10 [ffff90f9767a7a90] async_page_fault at ffffffff9298b7a8
- #11 [ffff90f9767a7b98] kmem_cache_alloc_trace at ffffffff92428a0c
- #12 [ffff90f9767a7c98] mntput at ffffffff92471d94
- #13 [ffff90f9767a7d88] kvm_sched_clock_read at ffffffff9226d3be
- #14 [ffff90f9767a7ec8] putname at ffffffff9245fd3d
- #15 [ffff90f9767a7f50] system_call_fastpath at ffffffff92994f92
- RIP: 00007f84fd928315 RSP: 00007f84fb011af8 RFLAGS: 00000206
- RAX: 000000000000004e RBX: 000000000244e010 RCX: ffffffffffffffff
- RDX: 0000000000008000 RSI: 000000000244e010 RDI: 0000000000000012
- RBP: 000000000244e010 R8: 0000000000000020 R9: 0000000000008030
- R10: 0000000000000076 R11: 0000000000000246 R12: ffffffffffffff30
- R13: 0000000000000000 R14: 000000000244dfe0 R15: 000000000000052a
- ORIG_RAX: 000000000000004e CS: 0033 SS: 002b
我们看到系统崩溃前的最后一个调用是“#15 [ffff90f9767a7f50] system_call_fastpath at ffffffff92994f92”,现在用 dis 命令来看一下该地址的反汇编结果
6、dis 反编译
- crash> dis -l ffffffff92994f92
- /usr/src/debug/kernel-3.10.0-1160.15.2.el7/linux-3.10.0-1160.15.2.el7.x86_64/arch/x86/kernel/entry_64.S: 511
- 0xffffffff92994f92 <system_call_fastpath+37>: mov %rax,0x50(%rsp)
7、查看源码
从上面的反汇编结果中,我们看到问题出在entry_64.S: 第511行代码,翻开源码的相应位置,如下;
- 492 system_call_fastpath:
- 493 #if __SYSCALL_MASK == ~0
- 494 cmpq $__NR_syscall_max+1,%rax
- 495 #else
- 496 andl $__SYSCALL_MASK,%eax
- 497 cmpl $__NR_syscall_max+1,%eax
- 498 #endif
- 499 jae badsys
- 500 ARRAY_INDEX_NOSPEC_SYSCALL clobber_reg=%rcx
- 501 movq %r10,%rcx
- 502
- 503 #ifdef CONFIG_RETPOLINE
- 504 movq sys_call_table(, %rax, 8), %rax
- 505 call __x86_indirect_thunk_rax
- 506 #else
- 507 call *sys_call_table(, %rax, 8) # XXX: rip relative
- 508 #endif
- 509
- 510 UNWIND_END_OF_STACK
- 511 movq %rax,RAX(%rsp)
- 512 /*
【编辑推荐】