一、服务器环境

Centos 7.3 4核16G VPS 10Mbps
新服务器,就跑了2个小业务,但是流量满了,导致数据库都打不开。

二、排查流程

1.查看监控流量

提醒大家业务服务器一定要第一时间装上监控,阿里云的或者zabbix。我当时没装,出问题的时候装很卡,非常痛苦。
监控显示VPS使用率已经90%了,查看实时流量吧。

2.用iftop命令查看实时流量发送情况

client                                => 183.61.49.150                            0b   9.14Kb  4.53Kb

排查的时候没有时间记录没上面的数据是事后的,通过iftop看到在疯狂给17.141.5.102这个IP发送数据,一下子就发了一个多G

3.查找端口和进程

找到了问题IP,接下来就是通过问题IP去找端口和进程了。

这里使用netstat命令查找端口和进程,ss命令也可以

netstat -anput |grep '17.141.5.102'
[root@client vhosts]# netstat -anput |grep '17.141.5.102'
tcp        0      0 172.17.198.238:52022    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:34732    17.141.5.102:443        TIME_WAIT   -                   
tcp        1    127 172.17.198.238:58518    17.141.5.102:443        CLOSING     -                   
tcp        0      0 172.17.198.238:40778    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:48398    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:36572    17.141.5.102:443        TIME_WAIT   -                   
tcp        0    438 172.17.198.238:54824    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      0 172.17.198.238:49590    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      1 172.17.198.238:55734    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    196 172.17.198.238:34470    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:47256    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    191 172.17.198.238:33548    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:42234    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      0 172.17.198.238:53708    17.141.5.102:443        TIME_WAIT   -                   
tcp        1      1 172.17.198.238:55880    17.141.5.102:443        CLOSING     -                   
tcp        0      1 172.17.198.238:53100    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0   1344 172.17.198.238:53044    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    197 172.17.198.238:45876    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    389 172.17.198.238:46812    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:43352    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    183 172.17.198.238:43624    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    375 172.17.198.238:58626    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:53734    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      0 172.17.198.238:51252    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:41724    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      1 172.17.198.238:53174    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      0 172.17.198.238:40024    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:51536    17.141.5.102:443        FIN_WAIT2   -                   
tcp        0      0 172.17.198.238:40069    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      0 172.17.198.238:51332    17.141.5.102:443        TIME_WAIT   -                   
tcp        0    389 172.17.198.238:58692    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:48216    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      0 172.17.198.238:53672    17.141.5.102:443        FIN_WAIT2   -                   
tcp        0      0 172.17.198.238:33592    17.141.5.102:443        TIME_WAIT   -                   
tcp        0      1 172.17.198.238:45940    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:34262    17.141.5.102:443        FIN_WAIT1   -                   
tcp   125453      0 172.17.198.238:53328    17.141.5.102:443        CLOSE_WAIT  13722/nginx: worker 
tcp        0    167 172.17.198.238:50148    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0      1 172.17.198.238:52118    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    197 172.17.198.238:46722    17.141.5.102:443        FIN_WAIT1   -                   
tcp        0    195 172.17.198.238:36260    17.141.5.102:443        FIN_WAIT1   -         

可以看到服务器在疯狂的请求17.141.5.102:443,并且通过上面可以看到nginx: worker ,这已经定位到了发送数据的进程。

4.解决问题

突然想起来,之前业务需要配置过代理服务器,代理目标服务器的https请求,例如https://www.baidu.com。然后我注释了nginx代理的虚拟主机。

server {
     listen                         80 default;
#     listen                         3128 default;

     # dns resolver used by forward proxying
     resolver                       8.8.8.8;

     # forward proxy for CONNECT request
     proxy_connect;
     proxy_connect_allow            443 563;
     proxy_connect_connect_timeout  10s;
     proxy_connect_read_timeout     10s;
     proxy_connect_send_timeout     10s;

     # forward proxy for non-CONNECT request
     location / {
         proxy_pass http://$host;
         proxy_set_header Host $host;
     }
 }

就那么一瞬间,感觉命令行流畅了。查看netstat发现进程慢慢少了下来。流量已经从9Mbps下降到了500Kbps以下。

三、总结

要熟练运营各种系统负载工具,如
ps
iotop
iftop
netstat
ss

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐