若数据库出现连ORA-00020连接数满,会导致业务连接不上进而导致业务中断,若本身数据库的连接数上限设置的不高,可以通过更改processes参数来处理(aler system set processes=xxx scope=spfile),该参数为静态参数,需要重启数据库生效。若本身数据库连接数设置的就非常高了,而是由于业务问题导致部分主机异常超高连接,则需要定位主机、重启主机、查杀会话等方式处理,这里介绍下若是后者的原因的处理思路。

  1. 检查alert日志
ORA-00020: maximum number of processes (4500) exceeded
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.
Fri Nov 05 09:35:48 2021
Thread 1 advanced to log sequence 191580 (LGWR switch)
  Current log# 1 seq# 191580 mem# 0: +SSD/hospital/onlinelog/redo01.log
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_4
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_3
Fri Nov 05 09:35:48 2021
LNS: Standby redo logfile selected for thread 1 sequence 191580 for destination LOG_ARCHIVE_DEST_2
Fri Nov 05 09:35:48 2021
Archived Log entry 904316 added for thread 1 sequence 191579 ID 0x431aef8a dest 1:
Fri Nov 05 09:37:30 2021
ORA-00020: maximum number of processes (4500) exceeded
 ORA-20 errors will not be written to the alert log for
 the next minute. Please look at trace files to see all
 the ORA-20 errors.

  1. 查看数据库当前连接数
SQL> show parameter process;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes                      integer     1
cell_offload_processing              boolean     TRUE
db_writer_processes                  integer     10
gcs_server_processes                 integer     4
global_txn_processes                 integer     1
job_queue_processes                  integer     1000
log_archive_max_processes            integer     4
processes                            integer     4500
processor_group_name                 string

SQL> select inst_id,count(*) from gv$session group by inst_id;

   INST_ID   COUNT(*)
---------- ----------
         1       4496
         2        715

  1. 定位问题主机
SQL> select program,machine,count(*)
  2  from gv$session group by program,machine,sql_id order by 3;

…

PROGRAM                                          MACHINE       COUNT(*)
-------------------------------- --------------------------  ----------
A.exe                         WORKGROUP\SRSERBER            10
B.exe                         WORKGROUP\QDQ                10
C.exe                         WORKGROUP\DQ              10
D.exe                         WORKGROUP\DQDR              10
E.exe                         WORKGROUP\DGQHU                10
...
dqgeq.exe                 WORKGROUP\234                 3845

  1. 根据主机名和程序名称定位主机IP
select b.spid from v$session a,v$process b where a.paddr=b.addr and a.program='程序名' and a.machine like '%主机名%';

netstat -anp|grep OSPID1
netstat -anp|grep OSPID2
netstat -anp|grep OSPID3

测试多个OSPID确保IP一致,将获取到的IP提供给客户或业务部门,重启该主机上的应用程序或直接重启该主机。

  1. 查杀会话
    重启主机或应用程序可能时间较长,此时可以通过kill ospid批量杀除会话的方法,先杀除一部分会话,保证业务可用。
select b.spid from v$session a,v$process b where a.paddr=b.addr and a.program='程序名' and a.machine like '%主机名%';

kill -9 OSPID
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐