分类: Linux

转自: http://blog.itpub.net/7728585/viewspace-772797

今天同事遇到VIP SCAN VIP SCAN LISTENER LOCAL LISTENER起不来的情况。查看原来是NETWORK资源已经DOWN掉,最后确定是子网掩码呗修改了,大家可以知道实际子网掩码用于限制你的网段是否能否访问,如果子网掩码出现问题肯定会出现问题。
我修改了我们的虚拟机的BOND0的子网掩码由以前的255.255.255.0修改为255.255.0.0
结果报错如下:
2013-08-09 00:47:23.885: [ora.net1.network][3288315648] {1:50347:319} [check] NetworkAgent::init exit }
2013-08-09 00:47:23.888: [ora.net1.network][3288315648] {1:50347:319} [check] NetInterface::scheckNetInterface returned 0
2013-08-09 00:47:23.889: [ora.net1.network][3288315648] {1:50347:319} [check] NetworkAgent::checkInterface returned false
2013-08-09 00:47:23.896: [ AGFW][3321874176] {1:50347:319} Agent sending reply for: RESOURCE_START[ora.net1.network rac2 1] ID 4098:391
2013-08-09 00:47:23.897: [ AGFW][3321874176] {1:50347:319} ora.net1.network rac2 1 state changed from: STARTING to: OFFLINE
2013-08-09 00:47:23.897: [ AGFW][3321874176] {1:50347:319} Switching online monitor to offline one
2013-08-09 00:47:23.902: [ AGFW][3321874176] {1:50347:319} Started implicit monitor for [ora.net1.network rac2 1] interval=60000 delay=60000
2013-08-09 00:47:23.903: [ AGFW][3321874176] {1:50347:319} Agent sending last reply for: RESOURCE_START[ora.net1.network rac2 1] ID 4098:391
2013-08-09 00:48:21.862: [ AGFW][3321874176] {2:25159:2} Agent received the message: AGENT_HB[Engine] ID 12293:475
2013-08-09 00:48:23.918: [ora.net1.network][3288315648] {1:50347:319} [check] NetworkAgent::init enter {
2013-08-09 00:48:24.004: [ora.net1.network][3288315648] {1:50347:319} [check] Checking if bond0 Interface is fine
2013-08-09 00:48:24.035: [ora.net1.network][3288315648] {1:50347:319} [check] ifname=bond0
2013-08-09 00:48:24.035: [ora.net1.network][3288315648] {1:50347:319} [check] subnetmask=255.255.0.0
2013-08-09 00:48:24.035: [ora.net1.network][3288315648] {1:50347:319} [check] subnetnumber=192.168.0.0
2013-08-09 00:48:24.068: [ AGENT][3288315648] {1:50347:319} UserErrorException: Locale is
2013-08-09 00:48:24.085: [ora.net1.network][3288315648] {1:50347:319} [check] CRS-5008: Invalid attribute value: bond0 for the network interface


参考:
VIP Fails to Start With PRCR-1079 CRS-2674 CRS-2632 and CRS-5008: Invalid attribute value: en0 for the network interface (文档 ID 1387413.1)

同时资源状态如下:
[grid@rac2 orarootagent_root]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.CRS.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE OFFLINE rac1
ONLINE OFFLINE rac2
ora.LISTENER3.lsnr
ONLINE OFFLINE rac1
ONLINE OFFLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE OFFLINE rac1
ONLINE OFFLINE rac2
ora.ons
ONLINE OFFLINE rac1
ONLINE OFFLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE OFFLINE
ora.cvu
1 ONLINE OFFLINE
ora.oc4j
1 ONLINE ONLINE rac1
ora.rac1.vip
1 ONLINE OFFLINE
ora.rac2.vip
1 ONLINE OFFLINE
ora.racdb.db
1 ONLINE ONLINE rac2 Open
2 ONLINE ONLINE rac1 Open
ora.scan1.vip
1 ONLINE OFFLINE

可以看到网络那一块基本都挂了,因为LISTENER基于VIP,VIP基于PUBLIC IP,这样一旦PUBLIC ip的资源
ora.net1.network出现问题,就这样了。全挂。
当然你也需要通过命令OIFCFG命令来查看当前系统可用的网卡
[grid@rac1 ~]$ oifcfg iflist -p -n
eth0 192.168.0.0 PRIVATE 255.255.0.0
eth1 192.168.0.0 PRIVATE 255.255.0.0
eth2 10.10.10.0 PRIVATE 255.255.255.0
eth2 169.254.0.0 UNKNOWN 255.255.128.0
eth3 10.10.11.0 PRIVATE 255.255.255.0
eth3 169.254.128.0 UNKNOWN 255.255.128.0
bond0 192.168.0.0 PRIVATE 255.255.0.0
也可以通过来查看现有的配置
oifcfg getif
[grid@rac1 ~]$ oifcfg getif
bond0 192.168.1.0 global public
eth2 10.10.10.0 global cluster_interconnect
eth3 10.10.11.0 global cluster_interconnect

同时需要通过
[grid@rac1 ~]$ srvctl config network -k 1
Network exists: 1/192.168.1.0/255.255.255.0/bond0, type static
确定你的在OCR配置的物理网卡的子网掩码
[grid@rac1 ~]$ srvctl config scan
SCAN name: racscan, Network: 1/192.168.1.0/255.255.255.0/bond0
SCAN VIP name: scan1, IP: /racscan/192.168.1.145
确定你的SCAN VIP的子网掩码
[grid@rac1 ~]$ srvctl config nodeapps -a
Network exists: 1/192.168.1.0/255.255.255.0/bond0, type static
VIP exists: /rac1vip/192.168.1.143/192.168.1.0/255.255.255.0/bond0, hosting node rac1
VIP exists: /rac2vip/192.168.1.144/192.168.1.0/255.255.255.0/bond0, hosting node rac2
来确定你的VIP 的子网掩码
可以看到这里都是255.255.255.0,但是我这里BOND0实际已经修改了子网掩码为255.255.0.0


那怎么办?当然最好修改回来,如果不行只能参考如下两个文章进行修改了
How to Modify Public Network Information including VIP in Oracle Clusterware (文档 ID 276434.1)
How to update the IP address of the SCAN VIP resources (ora.scan<n>.vip) (文档 ID 952903.1)

我们来试着修改一下
1、修改PUBLIC在OCR中的记录
[grid@rac1 ~]$ oifcfg delif -global bond0/192.168.1.0 (注意这里以 srvctl config network -k 1为准或者oifcfg getif为准)
[grid@rac1 ~]$ oifcfg getif
eth2 10.10.10.0 global cluster_interconnect
eth3 10.10.11.0 global cluster_interconnect
删除后上面就没有PUBLIC网卡信息了
然后设置
[grid@rac1 ~]$ oifcfg setif -global bond0/192.168.0.0:public (注意这里以oifcfg iflist -p -n为准)
设置完成后
[grid@rac1 ~]$ oifcfg getif
eth2 10.10.10.0 global cluster_interconnect
eth3 10.10.11.0 global cluster_interconnect
bond0 192.168.0.0 global public

2、修改VIP设置
停止数据库
[grid@rac1 ~]$ srvctl stop database -d racdb -o immediate
停止VIP
[grid@rac1 ~]$ srvctl stop vip -n rac1 -f
[grid@rac1 ~]$ srvctl stop vip -n rac2 -f
查看状态确认如下

ora.rac1.vip
1 OFFLINE OFFLINE
ora.rac2.vip
1 OFFLINE OFFLINE
ora.racdb.db
1 OFFLINE OFFLINE Instance Shutdown
2 OFFLINE OFFLINE Instance Shutdown

进行修改(root用户)
[root@rac1 network-scripts]# /oracle/app/grid/product/11.2.0/bin/srvctl modify nodeapps -n rac1 -A rac1vip/255.255.0.0/bond0
[root@rac1 network-scripts]# /oracle/app/grid/product/11.2.0/bin/srvctl modify nodeapps -n rac2 -A rac2vip/255.255.0.0/bond0
进行确认
[grid@rac1 ~]$ srvctl config nodeapps -a
Network exists: 1/192.168.0.0/255.255.0.0/bond0, type static
VIP exists: /rac1vip/192.168.1.143/192.168.0.0/255.255.0.0/bond0, hosting node rac1
VIP exists: /rac2vip/192.168.1.144/192.168.0.0/255.255.0.0/bond0, hosting node rac2
[grid@rac1 ~]$ ^C
[grid@rac1 ~]$ srvctl config network -k 1
Network exists: 1/192.168.0.0/255.255.0.0/bond0, type static
然后我们启动VIP
[grid@rac1 ~]$ crsctl start res ora.rac1.vip
CRS-2672: Attempting to start 'ora.rac1.vip' on 'rac1'
CRS-2676: Start of 'ora.rac1.vip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac1'
CRS-2672: Attempting to start 'ora.LISTENER3.lsnr' on 'rac1'
CRS-2676: Start of 'ora.LISTENER3.lsnr' on 'rac1' succeeded
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded

[grid@rac1 ~]$ crsctl start res ora.rac2.vip
CRS-2672: Attempting to start 'ora.rac2.vip' on 'rac2'
CRS-2676: Start of 'ora.rac2.vip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'rac2'
CRS-2672: Attempting to start 'ora.LISTENER3.lsnr' on 'rac2'
CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac2' succeeded
CRS-2676: Start of 'ora.LISTENER3.lsnr' on 'rac2' succeeded

可以看到正常启动了,LISTENER由于会被VIP拉动如下
START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type)
所以也启动了
3、修改SCAN VIP设置
关闭SCAN LISTENER,SCAN VIP
[grid@rac2 orarootagent_root]$ srvctl stop scan_listener -f
[grid@rac2 orarootagent_root]$ srvctl stop scan -f
确认
ora.LISTENER_SCAN1.lsnr
1 OFFLINE OFFLINE
ora.scan1.vip
1 OFFLINE OFFLINE
重新配置SCAN VIP(root权限)

[root@rac1 network-scripts]# /oracle/app/grid/product/11.2.0/bin/srvctl modify scan -n racscan
然后查看配置
[grid@rac1 ~]$ srvctl config scan
SCAN name: racscan, Network: 1/192.168.0.0/255.255.0.0/bond0
SCAN VIP name: scan1, IP: /racscan/192.168.1.145
已经修改,现在启动资源
[grid@rac1 ~]$ srvctl start scan
[grid@rac1 ~]$ srvctl start scan_listener
最后查看资源
[grid@rac2 orarootagent_root]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.CRS.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER3.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac2
ora.cvu
1 ONLINE ONLINE rac2
ora.oc4j
1 ONLINE ONLINE rac1
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.racdb.db
1 OFFLINE OFFLINE Instance Shutdown
2 OFFLINE OFFLINE Instance Shutdown
ora.scan1.vip
1 ONLINE ONLINE rac2

可以看到全部修复完成,启动数据库即可
srvctl start database -d racdb -o open

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐