问题描述

咱们开发环境的有一个进程特别耗资源,能帮分析下原因吗?之前场内和公有云也出现过几次,最终这个进程会把机器内存打满宕机。现在刚好发现这个问题现场在这里插入图片描述

解决思路加方法

对于进程跑满的现象第一件事就是看一下进程名,目测一下 结束了,第一次见

top -pH 48297 看一下具体进程中的哪一个线程有问题 发现只有一个进程 没有线程
在这里插入图片描述ps看一下这个服务的目录在哪

[root@yq01-kg-section1-bud3 libexec]# ps -ef | grep  abrt-hook-ccpp
root     45733 11797  0 12:18 pts/8    00:00:00 grep --color=auto abrt-hook-ccpp
root     48297     2 99 Nov16 ?        15:42:50 /usr/libexec/abrt-hook-ccpp 11 0 8669 0 0 1605530067 e 8669 8669

毫无头绪!!开始百度搜到了如下

abrtd

abrtd 是一个守护进程监控的应用程序崩溃.当发生崩溃时,它将收集的崩溃(核心文件的命令行, etc .)application ,并采取措施根据类型崩溃并根据 abrt.conf config 文件中的配置中.有插件的各种动作:例如 bugzilla 报表的崩溃,将该报表.通过 ftp 传输或报表或 scp .请查看手册页的相应的插件.
abrtd: automatically bug report daemon. 自动的bug 报告守护进程
linux调试程序,最痛苦的就是程序异常宕掉,但是找不到core文件,很难定位问题。但是有了core文件就容易定位多了。

一般是可以通过在环境变量中设置ulimit -c unlimited。但是现场实施人员有时会忘记设置这条命令。那么怎么办呢,可以通过设置linux的abrt服务来实现。

修改abrt-action-save-package-data.conf文件

将其修改为:

vi /etc/abrt/abrt-action-save-package-data.conf

# With this option set to "yes",
# only crashes in signed packages will be analyzed.
# the list of public keys used to check the signature is
# in the file gpg_keys
#
OpenGPGCheck = no


# Blacklisted packages
#
BlackList = nspluginwrapper, valgrind, strace, mono-core


# Process crashes in executables which do not belong to any package?
#
ProcessUnpackaged = yes


# Blacklisted executable paths (shell patterns)
#
BlackListedPaths = /usr/share/doc/, /example*, /usr/bin/nspluginviewer, /usr/lib/xulrunner-*/plugin-container


还可以调整core文件的大小:

[root@xx-host2 abrt]# cat abrt.conf 
# Enable this if you want abrtd to auto-unpack crashdump tarballs which appear
# in this directory (for example, uploaded via ftp, scp etc).
# Note: you must ensure that whatever directory you specify here exists
# and is writable for abrtd. abrtd will not create it automatically.
#
#WatchCrashdumpArchiveDir = /var/spool/abrt-upload


# Max size for crash storage [MiB] or 0 for unlimited
#
MaxCrashReportsSize = 1000


# Specify where you want to store coredumps and all files which are needed for
# reporting. (default:/var/spool/abrt)
#
# Changing dump location could cause problems with SELinux. See man abrt_selinux(8).
#
#DumpLocation = /var/spool/abrt


# If you want to automatically clean the upload directory you have to tweak the
# selinux policy.
#
DeleteUploaded = no

重启 abrtd 服务: service abrtd restart

有了core文件也需要及时删除,通过abrt-cli list查看文件的包,然后用abrt-cli rm 【文件包】就可以了。

遇到程序崩溃的时候abrt-hook-ccpp使用CPU太多,IO也太高导致系统跑满了,干脆停用算了
systemctl stop abrt-ccpp.service
systemctl disable abrt-ccpp.service
systemctl status abrt-ccpp.service

查了一下systemctl status abrt-ccpp.service发现根本就没有起这个服务

再次百度

usr/libexec/abrt-hook-ccpp为什么这个进程一直在增加
因为无法创建ccpp文件导致的

需要修改/etc/abrt/abrt-action-save-package-data.conf中ProcessUnpackaged参数。

修改/etc/abrt/abrt-action-save-package-data.conf中ProcessUnpackaged参数

sed -i 's/ProcessUnpackaged = no/ProcessUnpackaged = yes/g' /etc/abrt/abrt-action-save-package-data.conf&& service abrtd restart

修改后还是不行 查看系统日志

Nov 17 13:15:15 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:15 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:16 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:16 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:17 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service stop-sigterm timed out. Killing.
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service: main process exited, code=killed, status=9/KILL
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: Unit abrtd.service entered failed state.
Nov 17 13:15:17 yq01-kg-section1-bud3 systemd: abrtd.service failed.
Nov 17 13:15:17 yq01-kg-section1-bud3 abrtd: Lock file '.lock' is locked by process 48297

发现服务没有重启,而且显示一直被这个进程锁死,而这个进程就是那个占用资源特别多的一个进程

kill -9 48297

重启服务
查看服务状态
在这里插入图片描述
top重新看一下进程 哦耶!
在这里插入图片描述

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐