e8cc2347d62e9164fc0fd6f6b1545766.png

环境说明:虚拟机 CentOS7中解压一个8G的包时,内核报错php

Message from syslogd@cosmo-01 at Apr 25 11:05:59 ...

kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 21s! [xfs-data/dm-0:451]html

内核软死锁(soft lockup)bug缘由分析:

网上找资料分析了一下缘由,直接缘由是:若是CPU太忙致使喂狗(watchdog)不及时,此时系统会打印CPU死锁信息:java

kernel:BUG: soft lockup - CPU#0 stuck for 38s! [kworker/0:1:25758]node

kernel:BUG: soft lockup - CPU#7 stuck for 36s! [java:16182]git

......ubuntu

内核参数kernel.watchdog_thresh(/proc/sys/kernel/watchdog_thresh)系统默认值为10。若是超过2*10秒会打印信息,注意:调整值时参数不能大于60。centos

虽然调整该值能够延长喂狗等待时间,可是不能完全解决问题,只能致使信息延迟打印。所以问题的解决,仍是须要找到根本缘由。服务器

能够打开panic,将/proc/sys/kernel/panic的默认值0改成1,便于定位。post

网上查找资料,发现引起CPU死锁的缘由有不少种:性能

* 服务器电源供电不足,致使CPU电压不稳致使CPU死锁

https://ubuntuforums.org/showthread.php?t=2205211

I bought a small (500W) new power supply made by what I feel is a reputable company and made the swap.

GREAT NEWS: After replacing the power supply, the crashes completely stopped!

I wanted to wait a while just to be sure, but it is now a few weeks since the new powersupply went in, and I haven't had a single crash since.

The power supply is not something that I would normally worry about, but in this case it totally fixed my problem.

Thanks to those who read my post, and especially to those who responded.

* vcpus超过物理cpu cores

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

* 虚机所在的宿主机的CPU太忙或磁盘IO过高

* 虚机的的CPU太忙或磁盘IO过高

https://www.centos.org/forums/viewtopic.php?t=60087

* BIOS KVM开启之后的相关bug,关闭KVM可解决,但关闭之后物理机不支持虚拟化

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

* VM网卡驱动存在bug,处理高水位流量时存在bug致使CPU死锁

* BIOS开启了超频,致使超频时电压不稳,容易出现CPU死锁

https://ubuntuforums.org/showthread.php?t=2205211

* Linux kernel存在bug

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

* KVM存在bug

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

* clocksource tsc unstable on CentOS and cloud Linux with Hyper-V Virtualisation

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

经过设置clocksource=jiffies可解决

* BIOS Intel C-State开启致使,关闭可解决

https://unix.stackexchange.com/questions/70377/bug-soft-lockup-cpu-stuck-for-x-seconds

https://support.citrix.com/article/CTX127395

http://blog.sina.com.cn/s/blog_906d892d0102vn26.html

* BIOS spread spectrum开启致使

当主板上的时钟震荡发生器工做时,脉冲的尖峰会产生emi(电磁干扰)。spread spectrum(频展)设定功能能够下降脉冲发生器所产生的电磁干扰,脉冲波的尖峰会衰减为较为平滑的曲线。

若是咱们没有遇到电磁干扰问题,建议将此项设定为disabled,这栏能够优化系统的性能表现和稳定性;

不然应该将此项设定为enabled。 若是对cpu进行超频,必须将此项禁用。由于即便是微小的脉冲值漂移也会致使超频运行的cpu锁死。

再次强调:CPU超频时,SPREAD SPECTRUM必须关闭,不然容易出现锁死cpu的状况。

#追加到配置文件中echo 30 > /proc/sys/kernel/watchdog_thresh

#查看

[root@git-node1 data]# tail -1 /proc/sys/kernel/watchdog_thresh30#临时生效

sysctl-w kernel.watchdog_thresh=30#内核软死锁(soft lockup)bug缘由分析

Soft lockup名称解释:所谓,soft lockup就是说,这个bug没有让系统完全死机,可是若干个进程(或者kernel thread)被锁死在了某个状态(通常在内核区域),不少状况下这个是因为内核锁的使用的问题。vi /etc/sysctl.conf

kernel.watchdog_thresh=30

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐