exadata存储相关学习
以下执行的结果,部分来自真实的exadata,部分来自虚拟机模拟的exadata/ 是根文件系统/opt/oracle 存放已安装的exadata存储软件/var/log/oracle 存放存储节点操作系统,并记录崩溃(crash)日志/dev/md5和/dev/md6是系统分区,活动(active)和镜像副本/dev/md7和/dev/md8是exadata安装软件安装点、活动(ac...
以下执行的结果,部分来自真实的exadata,部分来自虚拟机模拟的exadata
/ 是根文件系统
/opt/oracle 存放已安装的exadata存储软件
/var/log/oracle 存放存储节点操作系统,并记录崩溃(crash)日志
/dev/md5和/dev/md6是系统分区,活动(active)和镜像副本
/dev/md7和/dev/md8是exadata安装软件安装点、活动(active)和镜像副本
/dev/md11挂在给/var/log/oracle
在任何给定的时间点,一个存储节点上同时只能挂载4个多设备(multidevice,MD )挂载点
查看分区情况,--以下来自真实环境
[root@exaceladm01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md5 9.8G 3.1G 6.2G 34% /
tmpfs 32G 4.0K 32G 1% /dev/shm
/dev/md7 2.0G 1.2G 702M 63% /opt/oracle
/dev/md4 110M 24M 79M 24% /boot
/dev/md11 2.3G 26M 2.1G 2% /var/log/oracle
[root@exaceladm01 ~]# mdadm -Q -D /dev/md5
/dev/md5:
Version : 0.90
Creation Time : Mon Nov 25 14:58:05 2019
Raid Level : raid1
Array Size : 10482304 (10.00 GiB 10.73 GB)
Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 5
Persistence : Superblock is persistent
Update Time : Tue Nov 26 11:16:30 2019
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : f2ee267c:e2e0f67e:04894333:532a878b
Events : 0.24
Number Major Minor RaidDevice State
0 65 5 0 active sync /dev/sdq5
1 65 21 1 active sync /dev/sdr5
[root@exaceladm01 ~]#
查看lun是否为文件系统分区,查看isSystemLun是否为True。 -- 以下来自虚拟环境
CellCLI> list lun '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02' detail
name: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
cellDisk: CD_disk02_cell1
deviceName: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
diskType: HardDisk
id: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 1G
physicalDrives: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk02
raidLevel: "RAID 0"
status: normal
CellCLI>
几个概念说明
DISK -- LUN -- CELLDISK -- GRIDDISK -- ASM Disk (其中celldisk和griddisk是1:n关系)
存储节点中同时包含传统物理硬盘和闪存模块
celldisk,它可以将一个LUN细分成更小的分区,叫做Griddisk 。
从闪存模块中构建的celldisk可以细分为闪存缓存(flash cache)或grid disk.物理磁盘类的只能细分为griddisk。
只有griddisk可以映射成asm磁盘。
物理盘是第一层抽象,每个物理盘被映射和呈现为一个LUN。在Exadata数据库一体机初始部署时自动创建,不需要人为干预。(所以没有create lun这个命令,从help create lun就可以看出)
将存在的LUN配置成celldisk。存储节点上映射号的LUN,可以创建成celldisk。一旦创建了celldisk,就可以细分成一个或多个griddisk。然后可以由ASM实例采纳,作为ASM磁盘组的候选盘(candidate)备用。
闪存缓存和基于闪存的Grid disk(即Flash Grid Disk)的主要区别相当简单。闪存缓存会自动缓存数据库最近访问的对象。
管理存储的系统用户
每个exadata存储服务器会配置3个默认用户,分别是root,celladmin,cellmonitor
[root@exacell01 ~]# id cellmonitor
uid=1001(cellmonitor) gid=501(cellmonitor) groups=501(cellmonitor),502(cellusers)
[root@exacell01 ~]# id celladmin
uid=1000(celladmin) gid=500(celladmin) groups=500(celladmin),502(cellusers)
[root@exacell01 ~]#
root , 超级用户权限,用来启停存储服务器
celladmin , 用于完成存储节点的管理任务。例如craete,alter,modify,使用cellclihe dcli工具
cellmonitor ,用于存储节点监控任务。
查看cellcli工具所有的可用命令,使用help 。
CellCLI> help
HELP [topic]
Available Topics:
ALTER
ALTER ALERTHISTORY
ALTER CELL
ALTER CELLDISK
ALTER FLASHCACHE
ALTER GRIDDISK
ALTER IBPORT
ALTER IORMPLAN
ALTER LUN
ALTER PHYSICALDISK
ALTER QUARANTINE
ALTER THRESHOLD
ASSIGN KEY
CALIBRATE
CREATE
CREATE CELL
CREATE CELLDISK
CREATE FLASHCACHE
CREATE FLASHLOG
CREATE GRIDDISK
CREATE KEY
CREATE QUARANTINE
CREATE THRESHOLD
DESCRIBE
DROP
DROP ALERTHISTORY
DROP CELL
DROP CELLDISK
DROP FLASHCACHE
DROP FLASHLOG
DROP GRIDDISK
DROP QUARANTINE
DROP THRESHOLD
EXPORT CELLDISK
IMPORT CELLDISK
LIST
LIST ACTIVEREQUEST
LIST ALERTDEFINITION
LIST ALERTHISTORY
LIST CELL
LIST CELLDISK
LIST FLASHCACHE
LIST FLASHCACHECONTENT
LIST FLASHLOG
LIST GRIDDISK
LIST IBPORT
LIST IORMPLAN
LIST KEY
LIST LUN
LIST METRICCURRENT
LIST METRICDEFINITION
LIST METRICHISTORY
LIST PHYSICALDISK
LIST QUARANTINE
LIST THRESHOLD
SET
SPOOL
START
CellCLI>
列出本存储节点上的所有闪存硬盘 。--以下来自虚拟环境
CellCLI> list LUN where disktype='flashdisk'
/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01 normal
/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02 normal
/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH03 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH03 normal
/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH04 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH04 normal
CellCLI>
查看LUN节点的详细信息。-- 以下来自虚拟环境
CellCLI> list lun where celldisk='FD_00_cell1'
/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01 normal
CellCLI> list lun where celldisk='FD_00_cell1' detail
name: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
cellDisk: FD_00_cell1
deviceName: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
diskType: FlashDisk
id: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
isSystemLun: FALSE
lunAutoCreate: FALSE
lunSize: 1G
physicalDrives: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH01
raidLevel: "RAID 0"
status: normal
CellCLI>
查看物理硬盘的详细信息 -- 以下来自虚拟环境
CellCLI> list physicaldisk '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01' detail
name: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01
diskType: HardDisk
luns: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/disk01
physicalInsertTime: 2019-11-26T05:47:08+08:00
physicalSize: 1G
status: normal
CellCLI> list physicaldisk '/opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02' detail
name: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02
diskType: FlashDisk
luns: /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/disks/raw/FLASH02
physicalInsertTime: 2019-11-26T05:47:08+08:00
physicalSize: 1G
status: normal
CellCLI>
查看celldisk的详细信息。比如celldisk到griddisk的映射关系,大小,状态等等。 -- 以下来自虚拟环境
CellCLI> list griddisk gd20 detail;
name: gd20
asmDiskgroupName: DATA_ADD
asmDiskName: GD20
asmFailGroupName: GD20
availableTo:
cachingPolicy: default
cellDisk: cd20
comment:
creationTime: 2019-11-26T10:07:35+08:00
diskType: HardDisk
errorCount: 0
id: 315f8adf-c9db-4514-a144-89140a61dc84
offset: 48M
size: 1.953125G
status: active
CellCLI>
创建celldisk
以下命令可以创建12个cell disk。每个celldisk是一个lun 。遵从默认的命名约定。
create delldisk all harddisk
创建griddisk
以下命令会创建一个grid disk,使用物理盘的最外圈部分的磁道以便获得高性能
create griddisk all harddisk prefix=data,size 500G
以下命令创建一个grid disk,使用内圈的磁道提供给对IO操作相对不那么重要的应用
create griddisk all prefix=FRA
配置flash griddisk
下面的操作是先删除当前的内存配置,然后用非默认的大小重建
drop flashcache
create flashcache all size=200G
create griddisk all flashdisk
exadata存储配置完毕后,下一步就是配置数据库节点,使其使用grid disk。cellinit.ora和cellip.ora文件必须在数据库节点上配置,以便使其能连接到存储节点去使用griddisk。
cellinit.ora文件中存放了数据库服务器节点的IP地址(也就是计算节点的私有IP)
cellip.ora文件中存放所有存储节点的IP地址 (也就是存储节点的私有IP)
创建asm磁盘组。从两个存储节点cell01和cell02中拿几块grid disk来创建一个存放数据的高荣誉的磁盘组。(略,和普通方式一样。) -- 以下SQL语句摘自官方文档
https://docs.oracle.com/en/engineered-systems/exadata-database-machine/sagug/exadata-administering-asm.html#GUID-555BD6CC-6668-4365-A20C-0C119C059EFA
SQL> CREATE DISKGROUP data HIGH REDUNDANCY
-- These grid disks are on cell01
DISK
'o/*/data_CD_00_cell01',
'o/*/data_CD_01_cell01',
'o/*/data_CD_02_cell01',
-- These grid disks are on cell02
DISK
'o/*/data_CD_00_cell02',
'o/*/data_CD_01_cell02',
'o/*/data_CD_02_cell02',
-- These disk group attributes must be set for cell access
-- Note that this disk group is set for cell only
ATTRIBUTE 'compatible.rdbms' = '11.2.0.4',
'content.type' = 'data',
'compatible.asm' = '19.0.0.0',
'au_size' = '4M',
'cell.smart_scan_capable' = 'TRUE';
管理存储服务器
imageinfo ,获取存储软件当前版本的详细信息。比如kernel版本,OS版本,活动镜像办法,节点的boot分区等等
-- 以下命令结果来自真实环境
root@exacel02 ~]# imageinfo
Kernel version: 2.6.39-400.264.1.el6uek.x86_64 #1 SMP Wed Aug 26 16:42:25 PDT 2015 x86_64
Cell version: OSS_12.1.2.2.0_LINUX.X64_150917
Cell rpm version: cell-12.1.2.2.0_LINUX.X64_150917-1.x86_64
Active image version: 12.1.2.2.0.150917
Active image activated: 2019-11-21 15:26:44 +0800
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
Cell boot usb partition: /dev/sdac1
Cell boot usb version: 12.1.2.2.0.150917
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
[root@exacel02 ~]#
imagehistory,查看在本节点上安装过的所有软件版本
[root@exacel02 config]# imagehistory
Version : 12.1.2.2.0.150917
Image activation date : 2019-11-21 15:26:44 +0800
Imaging mode : fresh
Imaging status : success
[root@exacel02 config]#
查看并删除存储节点上的旧的告警历史信息。注意3_1,3_2,3_3要一起删除,不能只删除一个,否则会报错。删除后,可以看到只剩余2了。 -- 以下内容来自虚拟环境
CellCLI> list alerthistory
1 2019-11-26T04:50:18+08:00 critical "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
2 2019-11-26T05:39:50+08:00 critical "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
3_1 2019-11-26T05:41:37+08:00 warning "Hugepage allocation failure in service cellsrv. Number of Hugepages allocated is 748, failed to allocate 152"
3_2 2019-11-26T05:49:18+08:00 warning "Hugepage allocation failure in service cellsrv. Number of Hugepages allocated is 840, failed to allocate 60"
3_3 2019-11-26T08:23:06+08:00 clear "Hugepage allocation was successful in service cellsrv."
CellCLI> drop alerthistory 1
Alert 1 successfully dropped
CellCLI> drop alerthistory 3_1
CELL-02643: DROP ALERTHISTORY command did not include all members of the alert sequence for 3_1. All members of the sequence must be dropped together.
CellCLI> drop alerthistory 3_1,3_2,3_3
Alert 3_1 successfully dropped
Alert 3_2 successfully dropped
Alert 3_3 successfully dropped
CellCLI>
CellCLI> list alerthistory
2 2019-11-26T05:39:50+08:00 critical "RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
CellCLI>
存储服务器故障排查
绝大多数的诊断工具都放在/opt/oracle.SupportTool文件夹下。比如
sundiag.sh --产生的诊断信息在/tmp/sundiag_Filesystem目录下,带时间戳的tar文件
exawatch(也就是替换掉了之前低版本的OSWatch).是随系统自动运行的。收集存储节点上的信息,并存放在/opt/oracle.ExaWatcher/archive目录下 。如果要从ExaWatcher产生的日志中生成或者抽取部分内容,可以使用GetExaWatcherResult.sh脚本。
exacheck -- 这个就不说了,都知道。
CheckHWnFWProfile -- 校验硬件组件和固件组件的详细情况。如果当前的硬件版本和固件版本是其接受的正确版本,则返回success。
存储节点的启停
1 确认节点上的GRIDDisk离线不会影响ASM实例。如果所有列出的Grid disk结果都是yes.表名可以安全地将所有grid disk离线,ASM不会受任何影响。 -- 以下内容来自虚拟环境
list griddisk attributes name,asmdeactivationoutcome
CellCLI> list griddisk attributes name,asmdeactivationoutcome
DATA_CD_disk01_cell1 Yes
DATA_CD_disk02_cell1 Yes
DATA_CD_disk03_cell1 Yes
DATA_CD_disk04_cell1 Yes
DATA_CD_disk05_cell1 Yes
DATA_CD_disk06_cell1 Yes
DATA_CD_disk07_cell1 Yes
DATA_CD_disk08_cell1 Yes
DATA_CD_disk09_cell1 Yes
DATA_CD_disk10_cell1 Yes
DATA_CD_disk11_cell1 Yes
DATA_CD_disk12_cell1 Yes
gd13 Yes
gd14 Yes
gd15 Yes
gd16 Yes
gd17 Yes
gd18 Yes
gd19 Yes
gd20 Yes
CellCLI>
2 确认步骤1结果全部为yes后,接下来执行
alter griddisk all inactive
CellCLI> alter griddisk all inactive
GridDisk DATA_CD_disk01_cell1 successfully altered
GridDisk DATA_CD_disk02_cell1 successfully altered
GridDisk DATA_CD_disk03_cell1 successfully altered
GridDisk DATA_CD_disk04_cell1 successfully altered
GridDisk DATA_CD_disk05_cell1 successfully altered
GridDisk DATA_CD_disk06_cell1 successfully altered
GridDisk DATA_CD_disk07_cell1 successfully altered
GridDisk DATA_CD_disk08_cell1 successfully altered
GridDisk DATA_CD_disk09_cell1 successfully altered
GridDisk DATA_CD_disk10_cell1 successfully altered
GridDisk DATA_CD_disk11_cell1 successfully altered
GridDisk DATA_CD_disk12_cell1 successfully altered
GridDisk gd13 successfully altered
GridDisk gd14 successfully altered
GridDisk gd15 successfully altered
GridDisk gd16 successfully altered
GridDisk gd17 successfully altered
GridDisk gd18 successfully altered
GridDisk gd19 successfully altered
GridDisk gd20 successfully altered
3 一旦关闭节点上的grid disk。应执行asmdeactivationoutcome来查看输出,并使用list griddisk确认所有的griddisk已经离线
CellCLI> list griddisk
DATA_CD_disk01_cell1 inactive
DATA_CD_disk02_cell1 inactive
DATA_CD_disk03_cell1 inactive
DATA_CD_disk04_cell1 inactive
DATA_CD_disk05_cell1 inactive
DATA_CD_disk06_cell1 inactive
DATA_CD_disk07_cell1 inactive
DATA_CD_disk08_cell1 inactive
DATA_CD_disk09_cell1 inactive
DATA_CD_disk10_cell1 inactive
DATA_CD_disk11_cell1 inactive
DATA_CD_disk12_cell1 inactive
gd13 inactive
gd14 inactive
gd15 inactive
gd16 inactive
gd17 inactive
gd18 inactive
gd19 inactive
gd20 inactive
CellCLI>
4 现在可以安全地关闭,重启和下线这个节点了。使用操作系统命令。
shutdown -h now
备注:如果存储节点关闭很长时间。则需要调整ASM的disk_repair_attribute参数。防止ASM检测到离线超期后将其删除。
alter diskgroup DG_DATA set attribute 'disk_repair_time'='8H'
存储节点的启动
1 alter griddisk all active
CellCLI> alter griddisk all active
GridDisk DATA_CD_disk01_cell1 successfully altered
GridDisk DATA_CD_disk02_cell1 successfully altered
GridDisk DATA_CD_disk03_cell1 successfully altered
GridDisk DATA_CD_disk04_cell1 successfully altered
GridDisk DATA_CD_disk05_cell1 successfully altered
GridDisk DATA_CD_disk06_cell1 successfully altered
GridDisk DATA_CD_disk07_cell1 successfully altered
GridDisk DATA_CD_disk08_cell1 successfully altered
GridDisk DATA_CD_disk09_cell1 successfully altered
GridDisk DATA_CD_disk10_cell1 successfully altered
GridDisk DATA_CD_disk11_cell1 successfully altered
GridDisk DATA_CD_disk12_cell1 successfully altered
GridDisk gd13 successfully altered
GridDisk gd14 successfully altered
GridDisk gd15 successfully altered
GridDisk gd16 successfully altered
GridDisk gd17 successfully altered
GridDisk gd18 successfully altered
GridDisk gd19 successfully altered
GridDisk gd20 successfully altered
CellCLI>
2 list griddisk attributes name,asmmodestatus -- 我这里没有开启计算节点,计算节点一直关闭的。只开了存储节点来模拟,所以结果和正常的 可能不一样。这一步。
CellCLI> list griddisk attributes name,asmmodestatus
DATA_CD_disk01_cell1 UNKNOWN
DATA_CD_disk02_cell1 UNKNOWN
DATA_CD_disk03_cell1 UNKNOWN
DATA_CD_disk04_cell1 UNKNOWN
DATA_CD_disk05_cell1 UNKNOWN
DATA_CD_disk06_cell1 UNUSED
DATA_CD_disk07_cell1 UNUSED
DATA_CD_disk08_cell1 UNUSED
DATA_CD_disk09_cell1 UNUSED
DATA_CD_disk10_cell1 UNUSED
DATA_CD_disk11_cell1 UNUSED
DATA_CD_disk12_cell1 UNUSED
gd13 UNKNOWN
gd14 UNKNOWN
gd15 UNKNOWN
gd16 UNKNOWN
gd17 UNKNOWN
gd18 UNKNOWN
gd19 UNKNOWN
gd20 UNKNOWN
CellCLI>
3 list cell ,list griddisk,等命令。书上只执行了list cell.
CellCLI> list cell detail
name: cell1
bbuTempThreshold: 60
bbuChargeThreshold: 800
bmcType: absent
cellVersion: OSS_11.2.3.2.0_LINUX.X64_120713
cpuCount: 1
diagHistoryDays: 7
fanCount: 1/1
fanStatus: normal
flashCacheMode: WriteThrough
id: 66c6e844-c66a-4661-8291-359459443084
interconnectCount: 2
interconnect1: eth0
iormBoost: 0.0
ipaddress1: 10.10.10.1/24
kernelVersion: 2.6.39-400.215.10.el5uek
makeModel: Fake hardware
metricHistoryDays: 7
offloadEfficiency: 1,000.0
powerCount: 1/1
powerStatus: normal
releaseVersion: 11.2.3.2.0
releaseTrackingBug: 14212264
status: online
temperatureReading: 0.0
temperatureStatus: normal
upTime: 0 days, 2:50
cellsrvStatus: running
msStatus: running
rsStatus: running
CellCLI> list cell
cell1 online
CellCLI>
CellCLI> list griddisk
DATA_CD_disk01_cell1 active
DATA_CD_disk02_cell1 active
DATA_CD_disk03_cell1 active
DATA_CD_disk04_cell1 active
DATA_CD_disk05_cell1 active
DATA_CD_disk06_cell1 active
DATA_CD_disk07_cell1 active
DATA_CD_disk08_cell1 active
DATA_CD_disk09_cell1 active
DATA_CD_disk10_cell1 active
DATA_CD_disk11_cell1 active
DATA_CD_disk12_cell1 active
gd13 active
gd14 active
gd15 active
gd16 active
gd17 active
gd18 active
gd19 active
gd20 active
CellCLI> list griddisk attributes name,asmdeactivationoutcome
DATA_CD_disk01_cell1 Yes
DATA_CD_disk02_cell1 Yes
DATA_CD_disk03_cell1 Yes
DATA_CD_disk04_cell1 Yes
DATA_CD_disk05_cell1 Yes
DATA_CD_disk06_cell1 Yes
DATA_CD_disk07_cell1 Yes
DATA_CD_disk08_cell1 Yes
DATA_CD_disk09_cell1 Yes
DATA_CD_disk10_cell1 Yes
DATA_CD_disk11_cell1 Yes
DATA_CD_disk12_cell1 Yes
gd13 Yes
gd14 Yes
gd15 Yes
gd16 Yes
gd17 Yes
gd18 Yes
gd19 Yes
gd20 Yes
CellCLI>
处理磁盘问题
在存储服务器上发现磁盘问题时,通常会产生以下的动作
1 检测到性能下降时,celldisk和物理硬盘的状态会变化
2 特定celldisk上的所有griddisk都会离线
3 MS服务通知cellsrv服务,告诉它发现问题,接着cellsrv通知ASM实例将griddisk离线
4 存储节点上的MS服务,然后执行一系列的约束检查来判断硬盘是否需要删除
5 如果硬盘通过了性能检测,MS服务通知cellsrv服务去把所有的celldisk和griddisk上线(online)
6 如果硬盘性能检测失败,celldisk和物理硬盘的状态会被改变,并且硬盘会从现有的可用配置中删除。
7 MS服务通知cellsrv服务关于硬盘的问题。介质,cellsrv服务通知ASM实例去删除节点上的所有griddisk。
8 如果配置了ASR,会向oracle技术支持提交硬盘替换的服务请求(SR).
9 可以用热备盘来替换故障盘或是向oracle申请替换硬盘。
当有信息显示硬盘的状态十分糟糕的时候,首要任务是通过存储节点的告警历史信息或者检查存储节点日志信息,定位故障硬盘的具体名称,系统中的位置,物理位置和slot号。同时也要参考下ASM的告警日志,确定ASM已经把故障硬盘离线(已经删除了硬盘),替换硬盘之前ASM已经完成了数据的重新分布(rebalance).
查看告警信息的命令
list alerthistory
list physicaldisk where disktype=harddisk and status=critical detail
list physicaldisk where disktype=harddisk and status like *.*failure.** detail
确认celldisk的相关griddisk已经删除。asm实例已经完成数据的重新分布操作(reblance)
select name,state from v$asm_diskgroup;
select * from v$asm_operation
替换存储节点上的物理盘3分钟后,所有的griddisk和celldisk会被自动重建,随后添加到各自的磁盘组,然后进行数据的重新分布。
END
更多推荐
所有评论(0)