RAC

ASM memory fragment problem in 11gR2 RAC

June 7, 2013 oracle, RAC No comments

前天处理了一启RAC ASM 内存泄露问题,系统为公司最后一套RAC one node 版本为11.2.0.3

这套核心RAC one node系统出现大量 log file switch buffer busy waits等待,并伴随大量cursor: pin S wait on X read by other session。

SID USERNAME   MACHINE                EVENT                          PARAM                   W   WT SQL                          ST     LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
 10917 PROD_DATA2 JDBC Thin Client       buffer busy waits              14/907008/8             0  294 /bgx0f86ucnw8v               A     355       7628
 13549 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  172 /520mkxqpf15q8               A     294      20514
  4456 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 6rsq14x8a7n5r/dsxmfbstazq7u  A     525      94053
  4521 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 6rsq14x8a7n5r/dsxmfbstazq7u  A     508      94052
  6831 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 1k8vrykzkvccg/d5dfm29f41s0d  A     739       1400
  3170 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:12:  1397817350/21363/0      0  391 4uj3xtfzq3a91/gky4m8njtfrcd  A     392     536146
  4611 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  181 fqc7yvkzxjzk7/cqb6f9fkfkcsq  A     725        932
 13448 PROD_DATA2 JDBC Thin Client       buffer busy waits              56/645165/1             0  730 fqc7yvkzxjzk7/cqb6f9fkfkcsq  A     730        736
   940 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  181 fqc7yvkzxjzk7/cqb6f9fkfkcsq  A     710        953
  8026 PROD_DATA2 JDBC Thin Client       buffer busy waits              20/279682/8             0  441 3f0thwr0rg23c/c4h5dg81ffxyr  A     441      94043
  9290 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  178 3f0thwr0rg23c/c4h5dg81ffxyr  A     220        473
  2750 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  199 b3dmt8325pmvf/bpk6g015uzqzt  A     417      81687
  3510 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  223 c5n6d774as72b/5ak0xdkuvc65n  A     273     536146
  2323 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  172 c5n6d774as72b/5ak0xdkuvc65n  A     407     536140
  3767 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  725 5q60jbr6700u7/g2th2agwfkk1t  A     725       1133
 10205 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  717 5q60jbr6700u7/g2th2agwfkk1t  A     717       1186
 10046 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  670 5q60jbr6700u7/g2th2agwfkk1t  A     670       1186
 10044 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  689 5q60jbr6700u7/g2th2agwfkk1t  A     689      24971
  6395 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35201/8              0  693 5q60jbr6700u7/g2th2agwfkk1t  A     693       1522
  9535 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  703 5q60jbr6700u7/g2th2agwfkk1t  A     703      42595
  6917 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 5q60jbr6700u7/g2th2agwfkk1t  A     709       1221
  2902 PROD_DATA2 JDBC Thin Client       buffer busy waits              21/35200/8              0  700 5q60jbr6700u7/g2th2agwfkk1t  A     700       2146
  9554 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 5q60jbr6700u7/g2th2agwfkk1t  A     726       2147
 11662 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:73:  1397817350/21911/0      0  377 cg3urj768yv8g/apun8g5mb9byt  A     378      18995
  9378 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 cg3urj768yv8g/7g63srczfgfd6  A     627      19057
  8698 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:73:  1397817350/21911/0      0  551 cg3urj768yv8g/5smgjxsscf91w  A     551      18996
  7575 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:0:   1397817350/21911/0      0  204 cg3urj768yv8g/793hdqw3bn04q  A     204      19035

...
 13028 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:0:   1397817350/21911/0      0  623 cg3urj768yv8g/8s97vpbnuktp2  A     623      19061

...
  6659 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:0:   1397817350/21911/0      0  615 cg3urj768yv8g/8s97vpbnuktp2  A     616      19035


  5187 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:56:  1397817350/21911/0      0  418 cg3urj768yv8g/793hdqw3bn04q  A     418      19069
  4933 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:0:   1397817350/21911/0      0  576 cg3urj768yv8g/9quv2rcpqjxwt  A     576      18977


  4346 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:0:   1397817350/21911/0      0  219 cg3urj768yv8g/8s97vpbnuktp2  A     546        943

....
  4082 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:14:  1397817350/21911/0      0  460 cg3urj768yv8g/520mkxqpf15q8  A     460       1522
  7267 PROD_DATA2 JDBC Thin Client       enq: SQ - contention:SQ-6:73:  1397817350/21911/0      0  415 cg3urj768yv8g/8s97vpbnuktp2  A     415        919
  9532 PROD_DATA2 JDBC Thin Client       DFS lock handle                1398145029/21916/0      0  353 2xw94b38cybhv/4001f9ha480pb  A     354     358380
  9716 PROD_DATA2 JDBC Thin Client       DFS lock handle                1398145029/21916/0      0  264 2xw94b38cybhv/4001f9ha480pb  A     265       2147
   512 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 fhd852bb8agv6/gq6xnzt1spkzp  A     654     533810
  7001 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  178 4q0c1a3bsynyw/guwq1fpq9r1dy  A     261     536140
 11348 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 d3wp3qvbz5bna/1w0c0jhzmp1hh  A     226       7299
 11164 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 9fwybsrc3h1dd/70k37x7hwg0w5  A     701       1185
  3096 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  220 9fwybsrc3h1dd/70k37x7hwg0w5  A     510       1522
  8100 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  181 91r0dubcptymp/dsxmfbstazq7u  A     649      95706
  8192 PROD_DATA2 JDBC Thin Client       buffer busy waits              24/274502/1             0  645 91r0dubcptymp/dsxmfbstazq7u  A     645      94041
  8783 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 91r0dubcptymp/dsxmfbstazq7u  A     664      94039
  6667 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 91r0dubcptymp/dsxmfbstazq7u  A     685       1272
  5445 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 91r0dubcptymp/dsxmfbstazq7u  A     688       1185
    38 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  199 91r0dubcptymp/dsxmfbstazq7u  A     695       1263
  9386 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  181 91r0dubcptymp/dsxmfbstazq7u  A     685      95693
  8868 PROD_DATA2 JDBC Thin Client       buffer busy waits              53/118272/8             0  675 ghy358rghc4ty/3p3z1j33tk594  A     675     486608
  6475 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 ghy358rghc4ty/3p3z1j33tk594  A     706     486608
  6899 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  178 ftmka9gnhru7t/1w0c0jhzmp1hh  A     738     442302
  4003 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 8a5g4frpyags9/1w0c0jhzmp1hh  A     401     435103

   SID USERNAME   MACHINE                EVENT                          PARAM                   W   WT SQL                          ST     LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
   861 PROD_DATA2 dcb-srv-0343-vm02      buffer busy waits              60/1078781/1            0  504 fjnq76rpzzpdy/a6tzf0tjrwvqh  A     505      20966
 12361 PROD_DATA2 xen-4-211-vm05         buffer busy waits              60/1078781/1            0  402 fjnq76rpzzpdy/a6tzf0tjrwvqh  A     403      20976
  6213 TOAD       JDBC Thin Client       buffer busy waits              41/34704/571            0  542 5kjukxbrv5w3r/dbkpyhyfyvdpu  A     542    1577749
  7513 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  168 cvvzaj3syc76b/1w0c0jhzmp1hh  A     703     502831
  8015 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  175 1tg6ux7vc22zm/32bjxyxc0xwfs  A     463        899
 10817 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  205 1tg6ux7vc22zm/32bjxyxc0xwfs  A     720      18977
 12268 PROD_DATA2 JDBC Thin Client       buffer busy waits              57/1302268/1            0  283 3szwvvzw5aq13/bpaj7frdt63mx  A     284      82802
  5959 PROD_DATA2 JDBC Thin Client       log file switch (archiving nee 0/0/0                   0  178 3szwvvzw5aq13/bpaj7frdt63mx  A     583      75654
 10975 PROD_DATA2 JDBC Thin Client       read by other session          27/223916/1             0    2 2b47rbdt23jp8/a4td25ftgtqt8  A      40         51
 10994 PROD_DATA2 JDBC Thin Client       read by other session          27/223916/1             0    2 2b47rbdt23jp8/459f3z9u4fb3u  A      34         34
 10999 PROD_DATA2 JDBC Thin Client       read by other session          27/223916/1             0    2 2b47rbdt23jp8/a4td25ftgtqt8  A      30         30
 11015 PROD_DATA2 JDBC Thin Client       read by other session          27/223916/1             0    2 2b47rbdt23jp8/a4td25ftgtqt8  A       8          9
  4688 PROD_DATA2 JDBC Thin Client       read by other session          20/1806690/1            0    2 4gzs73jtmhyd0/f2n7vrpx4v30v  A       4         10
  4626 PROD_DATA2 JDBC Thin Client       db file sequential read        27/630098/1             0    0 11n7q3tu19vvp/adghpykwhquu2  A       1         23
 10566 PROD_DATA2 JDBC Thin Client       db file sequential read        57/780833/1             0    0 50731ypu4mu31/adghpykwhquu2  A       1         34
 12527 PROD_DATA2 xen-21205-vm02         db file sequential read        28/168522/1             0    0 7szz51xufmuc4/520mkxqpf15q8  A       5         33
  5489 PROD_DATA2 JDBC Thin Client       read by other session          60/931838/1             0    1 1zbqmb9ukkhpd/459f3z9u4fb3u  A      13         13
  5063 PROD_DATA2 JDBC Thin Client       db file sequential read        17/871948/1             0    2 g8jjuy5wt7h92/64yswjj6cdv5j  A      12         16

   SID USERNAME   MACHINE                EVENT                          PARAM                   W   WT SQL                          ST     LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
 10302 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2044424941/278013233    0   44 42zra6dwxqwrd/cnm20xg9c333h  A      44         44
                                                                        07008/21474836480

 11330 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2044424941/278013233    0   15 42zra6dwxqwrd/523rw17fc75fb  A      14         38
                                                                        07008/21474836480
.....
 11698 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2044424941/278013233    0    2 42zra6dwxqwrd/459f3z9u4fb3u  A       1          2
                                                                        07008/21474836480

  6473 PROD_DATA2 JDBC Thin Client       read by other session          6/4044/1                0    0 42zra6dwxqwrd/gzcbp0razp0vn  A      44         46
 10136 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2044424941/278013233    0   25 42zra6dwxqwrd/523rw17fc75fb  A      25         41                                                                        07008/21474836480                                                                        36320/21474836480
  3669 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2179751475/452904301    0    3 2vncw4q0ysrjm/g6btczrdb4fa1  A       3         47
                                                                       36320/21474836480
  4854 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2179751475/452904301    0    3 2vncw4q0ysrjm/g6btczrdb4fa1  A       3         52
                                                                        36320/21474836480
   265 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2179751475/452904301    0    3 2vncw4q0ysrjm/g6btczrdb4fa1  A       3         50
                                                                        36320/21474836480

  9355 PROD_DATA2 JDBC Thin Client       cursor: pin S wait on X        2179751475/452904301    0    3 2vncw4q0ysrjm/g6btczrdb4fa1  A       3         52
 ......
 
                                                                        36320/21474836480

4764 rows selected.

主要原因archive进程无法归档。ASM无法分配新的文件标识。

ASMCMD [+arch/item/archivelog/2012_10_19] > cp thread_1_seq_3.257.797108745 aaaaa.test;         
ASMCMD-8012: can not determine file type for file
ORA-00569: Failed to acquire global enqueue. (DBD ERROR: OCIStmtExecute)

结合ASM trace log:

Wed Jun 05 07:39:52 2013
Dumping diagnostic data in directory=[cdmp_20130605073952], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412241].
Wed Jun 05 07:45:58 2013
Dumping diagnostic data in directory=[cdmp_20130605074557], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412242].
Wed Jun 05 07:52:44 2013
Dumping diagnostic data in directory=[cdmp_20130605075244], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412243].
Wed Jun 05 07:57:58 2013
Dumping diagnostic data in directory=[cdmp_20130605075758], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412244].
Wed Jun 05 08:03:59 2013
Dumping diagnostic data in directory=[cdmp_20130605080359], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412245].
Wed Jun 05 08:36:09 2013
SQL> /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch' 
SUCCESS: /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch'
Wed Jun 05 08:42:40 2013
Dumping diagnostic data in directory=[cdmp_20130605084240], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463685].
Wed Jun 05 08:47:47 2013
Dumping diagnostic data in directory=[cdmp_20130605084746], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463686].
Wed Jun 05 08:52:18 2013
NOTE: ASM client item_1:item disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/grid/11.2.0/diag/asm/+asm/+ASM1/trace/+ASM1_ora_25588.trc

早上07:39 ASM1 instance 出现问题 导致arch进程无法完成归档 进而导致后续的等待。

(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/oracle/11.2.0/oracle/product/db_1/bin/oracle)(ARGV0=oracleitem_1)(ENVS='ORACLE_HOME=/u01/oracle/11.2.0/oracle/product/db_1,ORACLE_SID=item_1,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=item_1)))
2013-06-05 07:39:53.587: [ora.item.db][1607948608] {1:21407:34683} [check] InstConnection::connectInt: server not attached
2013-06-05 07:39:53.717: [ USRTHRD][1607948608] {1:21407:34683} InstConnection:~InstConnection: this 1fc95100
2013-06-05 07:39:53.718: [ora.item.db][1607948608] {1:21407:34683} [check] DbAgent::isDgbHandleFO } status = 16646

Wed Jun 05 07:39:52 2013
ARC3: Error 19504 Creating archive log file to '+ARCH'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance item_1 - Archival Error
ORA-16038: log 7 sequence# 59248 cannot be archived
ORA-19504: failed to create file ""
ORA-00312: online log 7 thread 1: '+DATA/item/onlinelog/group_7.322.797746361'

For ASM2 instance:

Wed Jun 05 07:39:51 2013
Errors in file /u01/grid/11.2.0/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_3061.trc (incident=412241):
ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues")
Incident details in: /u01/grid/11.2.0/diag/asm/+asm/+ASM2/incident/incdir_412241/+ASM2_lmd0_3061_i412241.trc

GES resource已经无法在shared_pool 中初始化,对于这个问题还是比较少见的 针对ASM instance的shared_pool 4031,在11.2.0.3之后
oracle的建议值为memory_target =1536MB 针对这个参数oracle也没有放出相关的最佳时间文档,相信在下一个release的时候oracle会改变
memory_target的default value .目前默认为:

SQL> show parameter target

NAME				     TYPE			       VALUE
------------------------------------ --------------------------------- ------------------------------
memory_max_target		     big integer		       1084M
memory_target			     big integer		       1084M
pga_aggregate_target		     big integer		       0
sga_target			     big integer		       0

由于ASM中的memory 同样要负责GES GCS GRD 等资源的分配,memory fragment是一个不可忽视的问题,只是对于这个case而言大家极少的关注到
ASM 端的4031问题,针对于ASM在11.2.0.3(11.2.0.3 changed some default behaviors)的表现,oracle也应该意识到了这个问题。可以参考如下
文档获取一些有用信息。1536M 应该是oracle确定的一个最小安全值 这个也是得到了研发的认可,没有设置此参数的同学需要注意。

另外针对这个RAC 即使架构为RAC ONE NODE oracle在一些resource control方面仍然是常规RAC的行为,而RAC node NODE 也仅仅是减少了DB端的
一些资源争用。

针对这个case 可以做如下工作:

1 针对11gR2 设置 ASM memory_target 到一个安全的值

2 针对长时间running的database machines 需要做定期重启工作。这次出现问题的机器连续跑了220多天。

3 监控asm log 即使ASM 在11g中的稳定性已经得到了很大的加强。

Solaris 10g CRS 自动reboot一例

October 18, 2012 oracle, RAC No comments

最近RAC 的case实在太多,越来越觉得公司逐渐去rac的正确。一套sunOS的系统 在安装完crs 10g之后 服务器不停的自动reboot,具体环境如下:

root@bmsa#showrev 
Hostname: bmsa
Hostid: 84f94303
Release: 5.10
Kernel architecture: sun4v
Application architecture: sparc
Hardware provider: Oracle Corporation
Domain: 
Kernel version: SunOS 5.10 Generic_147440-25

root@bmsb#psrinfo -v 
Status of virtual processor 0 as of: 10/18/2012 00:05:32
  on-line since 10/17/2012 23:46:06.
  The sparcv9 processor operates at 2998 MHz,
        and has a sparcv9 floating point processor.
.....

Status of virtual processor 31 as of: 10/18/2012 00:05:32
  on-line since 10/17/2012 23:46:07.
  The sparcv9 processor operates at 2998 MHz,
        and has a sparcv9 floating point processor. 

root@bmsb# /usr/sbin/prtconf | grep "Memory size"
Memory size: 32768 Megabytes 


-bash-3.2$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.bmsa.gsd   application    ONLINE    ONLINE    bmsa        
ora.bmsa.ons   application    ONLINE    ONLINE    bmsa        
ora.bmsa.vip   application    ONLINE    ONLINE    bmsa        
ora.bmsb.gsd   application    ONLINE    ONLINE    bmsb        
ora.bmsb.ons   application    ONLINE    ONLINE    bmsb        
ora.bmsb.vip   application    ONLINE    ONLINE    bmsb        




-bash-3.2$ crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.1.0]
-bash-3.2$ crsctl query crs softwareversion
CRS software version on node [bmsa] is [10.2.0.1.0]

具体现象为5-10分钟 两个节点会自动reboot,除了cssd均无报错日志,oprocd 无相关日志,syslog无相关日志, ocssd.log如下:

CSSD]2012-10-17 23:40:23.955 >USER:    Oracle Database 10g CSS Release 10.2.0.1.0 Production Copyright 1996, 2004 Oracle.  All rights reserved.
[    CSSD]2012-10-17 23:40:23.955 >USER:    CSS daemon log for node bmsa, number 1, in cluster crs
[  clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=bmsaDBG_CSSD))
[    CSSD]2012-10-17 23:40:23.959 [1] >TRACE:   clssscmain: local-only set to false
[    CSSD]2012-10-17 23:40:23.966 [1] >TRACE:   clssnmReadNodeInfo: added node 1 (bmsa) to cluster
[    CSSD]2012-10-17 23:40:23.970 [1] >TRACE:   clssnmReadNodeInfo: added node 2 (bmsb) to cluster
[    CSSD]2012-10-17 23:40:23.973 [1] >TRACE:   clssgmInitCMInfo: Wait for remote node teermination set to 13 seconds
[    CSSD]2012-10-17 23:40:23.975 [5] >TRACE:   clssnm_skgxnmon: skgxn init failed, rc 1
[    CSSD]2012-10-17 23:40:23.975 [1] >TRACE:   clssnm_skgxnonline: Using vacuous skgxn monitor
[    CSSD]2012-10-17 23:40:23.975 [1] >TRACE:   clssnmInitNMInfo: misscount set to 30
[    CSSD]2012-10-17 23:40:23.981 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (0//dev/rdsk/c3t60A98000572D5A70614A6E3370535749d0s6)
[    CSSD]2012-10-17 23:40:23.983 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (1//dev/rdsk/c3t60A98000572D5A70614A6E337053574Fd0s6)
[    CSSD]2012-10-17 23:40:23.984 [1] >TRACE:   clssnmDiskStateChange: state from 1 to 2 disk (2//dev/rdsk/c3t60A98000572D5A70614A6E3370535751d0s6)
[    CSSD]2012-10-17 23:40:25.984 [6] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (0//dev/rdsk/c3t60A98000572D5A70614A6E3370535749d0s6)
[    CSSD]2012-10-17 23:40:25.985 [7] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (1//dev/rdsk/c3t60A98000572D5A70614A6E337053574Fd0s6)
[    CSSD]2012-10-17 23:40:25.987 [8] >TRACE:   clssnmDiskStateChange: state from 2 to 4 disk (2//dev/rdsk/c3t60A98000572D5A70614A6E3370535751d0s6)
[    CSSD]2012-10-17 23:40:25.991 [6] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(13) LATS(0) Disk lastSeqNo(13)
[    CSSD]2012-10-17 23:40:25.992 [7] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(13) LATS(0) Disk lastSeqNo(13)
[    CSSD]2012-10-17 23:40:25.992 [8] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(3) wrtcnt(13) LATS(0) Disk lastSeqNo(13)
[    CSSD]2012-10-17 23:40:26.055 [1] >TRACE:   clssnmFatalInit: fatal mode enabled
[    CSSD]2012-10-17 23:40:26.066 [10] >TRACE:   clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[    CSSD]2012-10-17 23:40:26.072 [10] >TRACE:   clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[    CSSD]2012-10-17 23:40:26.072 [10] >TRACE:   clssnmClusterListener: Probing node(2)
[    CSSD]2012-10-17 23:40:26.076 [11] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[    CSSD]2012-10-17 23:40:26.076 [11] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_bmsa_crs))
[    CSSD]2012-10-17 23:40:26.078 [15] >TRACE:   clssnmPollingThread: Connection complete
[    CSSD]2012-10-17 23:40:26.078 [16] >TRACE:   clssnmSendingThread: Connection complete
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmRcfgMgrThread: Connection complete
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmRcfgMgrThread: Local Join
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmDoSyncUpdate: Initiating sync 1
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmSetupAckWait: Ack message type (11) 
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmSetupAckWait: node(1) is ALIVE
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmSendSync: syncSeqNo(1)
[    CSSD]2012-10-17 23:40:26.078 [17] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)
[    CSSD]2012-10-17 23:40:26.078 [10] >TRACE:   clssnmHandleSync: Acknowledging sync: src[1] srcName[bmsa] seq[1] sync[1]
[    CSSD]2012-10-17 23:40:26.178 [1] >USER:    NMEVENT_SUSPEND [00][00][00][00]
[    CSSD]2012-10-17 23:40:27.078 [17] >TRACE:   clssnmWaitForAcks: done, msg type(11)
[    CSSD]2012-10-17 23:40:27.078 [17] >TRACE:   clssnmSetupAckWait: Ack message type (13) 
[    CSSD]2012-10-17 23:40:27.078 [17] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2012-10-17 23:40:27.078 [17] >TRACE:   clssnmSendVote: syncSeqNo(1)
[    CSSD]2012-10-17 23:40:27.078 [17] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2012-10-17 23:40:27.079 [10] >TRACE:   clssnmSendVoteInfo: node(1) syncSeqNo(1)
[    CSSD]2012-10-17 23:40:28.079 [17] >TRACE:   clssnmWaitForAcks: done, msg type(13)
[    CSSD]2012-10-17 23:40:28.079 [17] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[    CSSD]2012-10-17 23:40:29.079 [17] >TRACE:   clssnmEvict: Start
[    CSSD]2012-10-17 23:40:29.079 [17] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmWaitOnEvictions: Node(0) down, LATS(0),timeout(-109570438)
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmWaitOnEvictions: Node(2) down, LATS(0),timeout(-109570438)
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmSetupAckWait: Ack message type (15) 
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmSendUpdate: syncSeqNo(1)
[    CSSD]2012-10-17 23:40:29.080 [17] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)
[    CSSD]2012-10-17 23:40:29.080 [10] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2012-10-17 23:40:29.080 [10] >TRACE:   clssnmDeactivateNode: node 0 () left cluster

[    CSSD]2012-10-17 23:40:29.080 [10] >TRACE:   clssnmUpdateNodeState: node 1, state (2/2) unique (1350488423/1350488423) prevConuni(0) birth (1/1) (old/new)
[    CSSD]2012-10-17 23:40:29.080 [10] >TRACE:   clssnmUpdateNodeState: node 2, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2012-10-17 23:40:29.080 [10] >TRACE:   clssnmDeactivateNode: node 2 (bmsb) left cluster

[    CSSD]2012-10-17 23:40:29.080 [10] >USER:    clssnmHandleUpdate: SYNC(1) from node(1) completed
[    CSSD]2012-10-17 23:40:29.080 [10] >USER:    clssnmHandleUpdate: NODE 1 (bmsa) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmReconfigThread:  started for reconfig (1)
[    CSSD]2012-10-17 23:40:29.180 [18] >USER:    NMEVENT_RECONFIG [00][00][00][02]
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 1
[    CSSD]2012-10-17 23:40:29.180 [14] >TRACE:   clssgmPeerListener: connects done (1/1)
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes

[    CSSD]CLSS-3001: local node number 1, master node number 1

[    CSSD]2012-10-17 23:40:29.180 [18] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)
[    CSSD]2012-10-17 23:40:29.281 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(1007738b0) proc(100b1e050) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:40:29.437 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b1bd90) proc(100b1ea20) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:40:30.080 [17] >TRACE:   clssnmWaitForAcks: done, msg type(15)
[    CSSD]2012-10-17 23:40:30.080 [17] >TRACE:   clssnmDoSyncUpdate: Sync Complete!
[    CSSD]2012-10-17 23:40:30.893 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b23370) proc(100b267b0) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:40:30.916 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b27090) proc(100b29350) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:40:34.623 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b29370) proc(100b27c60) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:40:34.642 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b23b40) proc(100b29600) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:34.799 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b22210) proc(100b27a90) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:34.815 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b23bf0) proc(100b29690) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:35.198 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b22210) proc(100b27a90) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:35.216 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b26510) proc(100b24ce0) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:35.525 [11] >TRACE:   clssgmClientConnectMsg: Connect from con(100b221c0) proc(100b26720) pid() proto(10:2:1:1)
[    CSSD]2012-10-17 23:41:35.529 [11] >WARNING: clssgmShutDown: Received explicit shutdown request from client.
[    CSSD]2012-10-17 23:41:35.529 [11] >TRACE:   clssgmClientShutdown: Aborting client (100b1f820) proc (100b1e050)
[    CSSD]2012-10-17 23:41:35.529 [11] >TRACE:   clssgmClientShutdown: Aborting client (100b20050) proc (100b1e050)
[    CSSD]2012-10-17 23:41:35.529 [11] >TRACE:   clssgmClientShutdown: Aborting client (100b20ab0) proc (100b1e050)
[    CSSD]2012-10-17 23:41:35.529 [11] >TRACE:   clssgmClientShutdown: Aborting client (100b1ed80) proc (100b1ea20)
[    CSSD]2012-10-17 23:41:35.529 [11] >TRACE:   clssgmClientShutdown: waited 0 seconds on 4 IO capable clients
[    CSSD]2012-10-17 23:41:35.539 [11] >WARNING: clssgmClientShutdown: graceful shutdown completed.

23:41:35.529 收到shutdown request 23:41:35.539 系统reboot,日志中”clssnm_skgxnmon: skgxn init failed, rc 1” 值得关注,对于这个错误,可以参考一些文章

DBA Note
RAC Reboot due to system time change
这两篇文章均提到了chang system time 导致的css reboot行为.并且提到了使用non fatal mode启动的方案:

In some extreme cases, it may be necessary to disable fatal mode for OPROCD to find a root cause. DO NOT do this in a production environment. This is completely unsupported and could cause data corruption due to lack of I/O fencing.

1. Back up the init.cssd file. Example:

Sun and Linux:
cp /etc/init.d/init.cssd /etc/init.d/init.cssd.bak

HP-UX and HP Tru64:
cp /sbin/init.d/init.cssd /sbin/init.d/init.cssd.bak

IBM AIX:
cp /etc/init.cssd /etc/init.cssd.bak

2. Stop the CRS stack or boot the node in single user mode.

To stop the CRS Stack:

Sun and Linux:
/etc/init.d/init.crs stop

HP-UX and HP Tru64:
/sbin/init.d/init.crs stop

IBM AIX:
/etc/init.crs/init.crs stop

3. Confirm that the CRS stack is down.

ps -ef | grep d.bin

4. Edit the init.cssd file from the location in step 1, change the OPROCD
startup line to a non-fatal startup:

Sun Example:

# in fatal mode we always will start OPROCD FATAL
if [ $OPROCD_EXISTS ]; then
$OPROCD start -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN
$OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL
fi

Change this to:

# in fatal mode we always will start OPROCD FATAL
if [ $OPROCD_EXISTS ]; then
$OPROCD startInstall -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN
$OPROCD check -t $OPROCD_CHECK_TIMEOUT 2>$NULL
fi

You could also combine this method with the ‘tracing system calls’ method
for more debugging.

5. Reboot the node(s)

对于这种方法没有敢尝试,采用了修改cssd “DISABLE_OPROCD”的方法具体如下:

#!/bin/sh
#
# Copyright (c) 2001, 2010, Oracle and/or its affiliates. All rights reserved. 
#
# init.cssd - Control script for the Oracle CSS daemon.
#
#   In a full RAC install, this should not be in an rcX.d
#   directory. It should only be invoked from init.crs.
#   Never by hand.
#
#     No manual invocation of init.cssd is supported on a cluster.
#
#   In a local, non-cluster, installation without RAC, it should
#   be placed in an rcX.d directory. It may be invoked by hand
#   if necessary, however there are a series of risks and complications
#   that should be kept in mind when doing so.
#
#     Actions usable in local-only configuration are:
#          start, stop, disable and enable.
#   
# ==========================================================
# Porting Definitions and requirements:
#
# FAST_REBOOT - take out the machine now. We are concerned about
#               data integrity since the other node has evicted us.
# SLOW_REBOOT - We can rely on vendor clusterware to delay sending
#               the reconfig to the other node due to its IO fencing
#               guarantees. So trigger its diagnostic reboot.
# VC_UP - Determine whether Vendor clusterware processes are active.
#         If they are, then the remote node will hear that CSS/CLSVMON
#         have died, and we will be forced to use FAST_REBOOT.
#         This is also used at startup time for dependency checking.
#         Returns 0 if they are up, 1 otherwise. This should be
#         an extremely fast check.
# CLINFO - Determine whether we are booted in non-cluster mode.
#          Returns 0 for cluster mode, 1 for non-cluster mode
#          This call is allowed to take a long time to decide.
#
# GETBOOTID - Returns a string that uniquely identifies this boot.
#             This string must never change while the machine is up,
#             and must change the next time the machine boots.

ORA_CRS_HOME=/u01/app/oracle/product/10.2.0/crs_1
ORACLE_USER=oracle

ORACLE_HOME=$ORA_CRS_HOME

export ORACLE_HOME
export ORA_CRS_HOME
export ORACLE_USER

# Set DISABLE_OPROCD to false. Platforms that do not ship an oprocd
# binary should override this below.
DISABLE_OPROCD=false   -------------modify to true
# Default OPROCD timeout values defined here, so that it can be
# over-ridden as needed by a platform.
# default Timout of 1000 ms and a margin of 500ms
OPROCD_DEFAULT_TIMEOUT=1000
OPROCD_DEFAULT_MARGIN=500
# default Timeout for other actions
OPROCD_CHECK_TIMEOUT=2000

.................

手动reboot两个节点之后 root@bmsa#ps -ef |grep op
oracle 1706 1 0 09:23:56 ? 0:00 /u01/app/oracle/product/10.2.0/crs_1/opmn/bin/ons -d
oracle 1707 1706 0 09:23:56 ? 0:00 /u01/app/oracle/product/10.2.0/crs_1/opmn/bin/ons -d
root 14424 14345 0 11:36:50 pts/4 0:00 grep op

oprocd 进程已经消失 此时仍以fatal mode启动

[    CSSD]2012-10-17 23:46:46.883 [1] >TRACE:   clssnmFatalInit: fatal mode enabled
[    CSSD]2012-10-17 23:46:58.025 [19] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 2 with 2 nodes

[    CSSD]CLSS-3001: local node number 1, master node number 1

[    CSSD]2012-10-17 23:46:58.025 [19] >TRACE:   clssgmReconfigThread:  completed for reconfig(2), with status(1)
[    CSSD]2012-10-17 23:46:58.959 [17] >TRACE:   clssnmWaitForAcks: done, msg type(15)
[    CSSD]2012-10-17 23:46:58.959 [17] >TRACE:   clssnmDoSyncUpdate: Sync Complete!

此时已经不再出现reboot行为,迅速升级到10.2.0.5的crs,修改回 cssd :

-bash-3.2$ ps -ef |grep op
  oracle 27803 15868   0 13:48:17 pts/4       0:00 grep op
    root  1172   743   0 09:23:41 ?           0:00 /bin/sh /etc/init.d/init.cssd oprocd
  oracle 24312     1   0 13:41:52 ?           0:00 /u01/app/oracle/product/10.2.0/crs_1/opmn/bin/ons -d
    root  1339  1172   0 09:23:41 ?           0:01 /u01/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5
  oracle 24313 24312   0 13:41:52 ?           0:00 /u01/app/oracle/product/10.2.0/crs_1/opmn/bin/ons -d
  
-bash-3.2$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    bmsa        
ora....SA.lsnr application    ONLINE    ONLINE    bmsa        
ora.bmsa.gsd   application    ONLINE    ONLINE    bmsa        
ora.bmsa.ons   application    ONLINE    ONLINE    bmsa        
ora.bmsa.vip   application    ONLINE    ONLINE    bmsa        
ora....SM2.asm application    ONLINE    ONLINE    bmsb        
ora....SB.lsnr application    ONLINE    ONLINE    bmsb        
ora.bmsb.gsd   application    ONLINE    ONLINE    bmsb        
ora.bmsb.ons   application    ONLINE    ONLINE    bmsb        
ora.bmsb.vip   application    ONLINE    ONLINE    bmsb        
ora....SM3.asm application    ONLINE    ONLINE    bmsc        
ora....SC.lsnr application    ONLINE    ONLINE    bmsc        
ora.bmsc.gsd   application    ONLINE    ONLINE    bmsc        
ora.bmsc.ons   application    ONLINE    ONLINE    bmsc        
ora.bmsc.vip   application    ONLINE    ONLINE    bmsc 

11gR2 rac install ASM terminating the instance due to error 481

October 16, 2012 11g, oracle, RAC No comments

一套新上线的RAC系统(redhat 5.8 RAC 11.2.0.3) 安装过程中跑GI root.sh时抛出error 481错误:

CRS-2672: Attempting to start 'ora.asm' on 'ptdb1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/g01/app/oracle/product/11.2.0/grid/log/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'racnode1' failed
..
Failed to start ASM at /software/app/11.2.0.3/crs/install/crsconfig_lib.pm line 1272

这一问题的解决可以参考如下文档:


ASM on Non First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn): terminating the instance due to error 481

In this Document
Purpose
Details
Case1: link local IP (169.254.x.x) is being used by other adapter/network
Case2: firewall exists between nodes on private network (iptables etc)
Case3: HAIP is up on some nodes but not on all
Case4: HAIP is up on all nodes but some do not have route info
References
Applies to:

Oracle Server – Enterprise Edition – Version 11.2.0.1 and later
Information in this document applies to any platform.
Purpose

This note lists common causes of ASM start up failure with the following error on non-first node (second or others):

alert_.log from non-first node
lmon registered with NM – instance number 2 (internal mem no 1)
Tue Dec 06 06:16:15 2011
System state dump requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /g01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_19138.trc
Tue Dec 06 06:16:15 2011
PMON (ospid: 19095): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20111206061615], requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
Tue Dec 06 06:16:15 2011
ORA-1092 : opitsk aborting process

Note: ASM instance terminates shortly after “lmon registered with NM”
If ASM on non-first node was running previously, likely the following will be in alert.log when it failed originally:
..
IPC Send timeout detected. Sender: ospid 32231 [oracle@ftdcslsedw01b (PING)]
..
ORA-29740: evicted by instance number 1, group incarnation 10
..

diag trace from non-first ASM (+ASMn_diag_.trc)
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE])

alert_.log from first node
LMON (ospid: 15986) detects hung instances during IMR reconfiguration
LMON (ospid: 15986) tries to kill the instance 2 in 37 seconds.
Please check instance 2’s alert log and LMON trace file for more details.
..
Remote instance kill is issued with system inc 64
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Reconfiguration started (old inc 64, new inc 66)

If the issue happens while running root script (root.sh or rootupgrade.sh) as part of Grid Infrastructure installation/upgrade process, the following symptoms will present:

root script screen output
Start of resource “ora.asm” failed

CRS-2672: Attempting to start ‘ora.asm’ on ‘racnode1’
CRS-5017: The resource action “ora.asm start” encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to “(:CLSN00107:)” in “/ocw/grid/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log”.
CRS-2674: Start of ‘ora.asm’ on ‘racnode1’ failed
..
Failed to start ASM at /ispiris-qa/app/11.2.0.3/crs/install/crsconfig_lib.pm line 1272
$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_.log
2011-11-29 15:56:48: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl start resource ora.asm -init
..
> CRS-2672: Attempting to start ‘ora.asm’ on ‘racnode1’
> CRS-5017: The resource action “ora.asm start” encountered the following error:
> ORA-03113: end-of-file on communication channel
> Process ID: 0
> Session ID: 0 Serial number: 0
> . For details refer to “(:CLSN00107:)” in “/ispiris-qa/app/11.2.0.3/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log”.
> CRS-2674: Start of ‘ora.asm’ on ‘racnode1’ failed
> CRS-2679: Attempting to clean ‘ora.asm’ on ‘racnode1’
> CRS-2681: Clean of ‘ora.asm’ on ‘racnode1’ succeeded
..
> CRS-4000: Command Start failed, or completed with errors.
>End Command output
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl check resource ora.asm -init
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl status resource ora.asm -init
2011-11-29 15:59:01: Checking the status of ora.asm
..
2011-11-29 15:59:53: Start of resource “ora.asm” failed
Details

Case1: link local IP (169.254.x.x) is being used by other adapter/network

Symptoms:

$GRID_HOME/log//alert.log
[/ocw/grid/bin/orarootagent.bin(4813)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0
OS messages (optional)
Dec 6 06:11:14 racnode1 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67
Dec 6 06:11:14 racnode1 dhclient: DHCPACK from 169.254.95.118
ifconfig -a
..
usb0 Link encap:Ethernet HWaddr E6:1F:13:AD:EE:D3
inet addr:169.254.95.120 Bcast:169.254.95.255 Mask:255.255.255.0
..

Note: it’s usb0 in this case, but it can be any other adapter which uses link local

Solution:

Link local IP must not be used by any other network on cluster nodes. In this case, an USB network device gets IP 169.254.95.118 from DHCP server which disrupted HAIP routing, and solution is to black list the device in udev from being activated automatically.
Case2: firewall exists between nodes on private network (iptables etc)

No firewall is allowed on private network (cluster_interconnect) between nodes including software firewall like iptables, ipmon etc
Case3: HAIP is up on some nodes but not on all

Symptoms:

alert_<+ASMn>.log for some instances
Cluster communication is configured to use the following interface(s) for this instance
10.1.0.1
alert_<+ASMn>.log for other instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.201.65

Note: some instances is using HAIP while others are not, so they can not talk to each other
Solution:

The solution is to bring up HAIP on all nodes.

To find out HAIP status, execute the following on all nodes:

$GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init

If it’s offline, try to bring it up as root:

$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip -init

If HAIP fails to start, refer to note 1210883.1 for known issues.

If the “up node” is not using HAIP, and no outage is allowed, the workaround is to set init.ora/spfile parameter cluster_interconnect to the private IP of each node to allow ASM/DB to come up on “down node”. Once a maintenance window is planned, the parameter must be removed to allow HAIP to work.

Case4: HAIP is up on all nodes but some do not have route info

Symptoms:

alert_<+ASMn>.log for all instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.xxx.xxx
“netstat -rn” for some nodes (surviving nodes) missing HAIP route
netstat -rn
Destination Gateway Genmask Flags MSS Window irtt Iface
161.130.90.0 0.0.0.0 255.255.248.0 U 0 0 0 bond0
160.131.11.0 0.0.0.0 255.255.255.0 U 0 0 0 bond2
0.0.0.0 160.11.80.1 0.0.0.0 UG 0 0 0 bond0

The line for HAIP is missing, i.e:

169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2

Note: As HAIP route info is missing on some nodes, HAIP is not pingable; usually newly restarted node will have HAIP route info
Solution:

The solution is to manually add HAIP route info on the nodes that’s missing:

4.1. Execute “netstat -rn” on any node that has HAIP route info and locate the following:

169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2

Note: the first field is HAIP subnet ID and will start with 169.254.xxx.xxx, the third field is HAIP subnet netmask and the last field is private network adapter name

4.2. Execute the following as root on the node that’s missing HAIP route:

# route add -net netmask dev

i.e.

# route add -net 169.254.0.0 netmask 255.255.0.0 dev bond2

4.3. Start ora.crsd as root on the node that’s partial up:.

# $GRID_HOME/bin/crsctl start res ora.crsd -init

The other workaround is to restart GI on the node that’s missing HAIP route with “crsctl stop crs -f” and “crsctl start crs” command as root.

此时抛出的错误确实是usb0导致,kvm的控制台使用dbcp动态分配ip使得两台app服务器分配了169.254网段的ip.禁止掉dhcp之后手动分配ip继续抛出如下错误:

[cssd(22096)]CRS-1605:CSSD voting file is online: /dev/mapper/mpath3; details in /g01/app/oracle/product/11.2.0/grid/log/ptdb2/cssd/ocssd.log.
[cssd(22096)]CRS-1636:The CSS daemon was started in exclusive mode but found an active CSS daemon on node ptdb1 and is terminating; details at (:CSSNM00006:) in /g01/app/oracle/product/11.2.0/grid/log/ptdb2/cssd/ocssd.log
2012-10-15 10:07:33.421
[ohasd(20942)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ptdb2'.

2012-10-15 10:11:27.669
[cssd(22425)]CRS-1662:Member kill requested by node ptdb1 for member number 1, group DB+ASM
2012-10-15 10:11:29.931
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.079
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.119
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.197
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:32.837
[/g01/app/oracle/product/11.2.0/grid/bin/orarootagent.bin(22496)]CRS-5016:Process "/g01/app/oracle/product/11.2.0/grid/bin/acfsload" spawned by agent "/g01/app/oracle/product/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/orarootagent_root/orarootagent_root.log"
2012-10-15 10:11:33.068
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".

查看 orarootagent_root.log

2012-10-15 10:08:40.295: [ USRTHRD][1115584832] {0:0:209} HAIP:  Updating member info HAIP1;172.168.0.0#0
2012-10-15 10:08:40.298: [ USRTHRD][1115584832] {0:0:209} InitializeHaIps[ 0]  infList 'inf eth1, ip 172.168.0.102, sub 172.168.0.0'
2012-10-15 10:08:40.299: [ USRTHRD][1115584832] {0:0:209} Error in getting Key SYSTEM.network.haip.group.cluster_interconnect.interface.valid in OCR
2012-10-15 10:08:40.303: [ CLSINET][1115584832] failed to open OLR HAIP subtype SYSTEM.network.haip.group.cluster_interconnect.interface.valid key, rc=4

问题就很好定位了,虽然所有node的res haip的状态为online 但是手动去ping对方 无法ping通 节点2无法通过haip获得必要资源

两个原因导致这个问题:

1. 网卡的driver有问题,导致在private 网卡上虚拟出来的haip 无法通信

2. 交换机的配置有问题有可能vlan的划分 或者交换机的bug导致这一现象

所以对上述文档的case ,我们可以增加一种:

CASE 5

如果HAIP 均online 并且 netstat -rn 均存在相关entry,需要保证node之间的haip能够ping通。在此例中 如果手工”restart res haip”则问题可以得到暂时解决.但是如果reboot机器 问题依旧,最后采取交叉线直连两个节点解决问题,如果节点数大于2 需要从交换机着手此问题。

对于11.2 CRS troubleshooting ,主要的日志有下面这些:
<GRID_HOME>/log/<host>/alert<node>.log :这是CRS 的 alert log,CRS 的启动/停止,核心进程(crsd, ocssd etc.)的启动/停止,对于一些资源的检查异常都会记录下来。

<GRID_HOME>/log/<host>/crsd : 这是CRSD的日志,记录了CRSD 监控的资源的启动、停止以及异常,比如数据库实例、ASM、监听、磁盘组、VIP等。
<GRID_HOME>/log/<host>/cssd: 这是OCSSD进程的日志,记录了节点间的通讯情况,如果有网络心跳或者Voting disk的异常,或者OCSSD进程异常终止,都会记录下来。
<GRID_HOME>/log/<host>/evmd:这是事件监控(event monitor)进程的日志。
<GRID_HOME>/log/<host>/ohasd:这是OHASD进程的日志,有的时候CRS无法启动,可能需要查看这个日志。
<GRID_HOME>/log/<host>/mdnsd:DNS相关的日志
<GRID_HOME>/log/<host>/gipcd:进程间或者节点间通讯的日志
<GRID_HOME>/log/<host>/gpnpd: 关于GPNP(Grid Plug & Play Profile )的日志
<GRID_HOME>/log/<host>/gnsd(可选):Grid Naming Service 的日志
<GRID_HOME>/log/<host>/ctssd: 时间同步服务的日志。

<GRID_HOME>/log/<host>/agent/ohasd:
<GRID_HOME>/log/<host>/agent/ohasd/oraagent_grid: 启动/停止/检查/清除ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd等资源。
<GRID_HOME>/log/<host>/agent/ohasd/orarootagent_root:启动/停止 /检查/清除 ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs, ora.crf (11.2.0.2)等资源。
<GRID_HOME>/log/<host>/agent/ohasd/oracssdagent_root: 启动/停止/检查 ocssd进程。
<GRID_HOME>/log/<host>/agent/ohasd/oracssdmonitor_root:监控cssdagent进程。

<GRID_HOME>/log/<host>/agent/crsd:
<GRID_HOME>/log/<host>/agent/crsd/oraagent_grid: 启动/停止/检查/清除 asm, ora.eons, ora.LISTENER.lsnr, SCAN listeners, ora.ons, diskgroup  等资源
<GRID_HOME>/log/<host>/agent/crsd/oraagent_oracle: 启动/停止/检查/清除 service, database 等资源
<GRID_HOME>/log/<host>/agent/crsd/orarootagent_root : 启动/停止/检查/清除 GNS, VIP, SCAN VIP and network 等资源.
<GRID_HOME>/log/<host>/agent/crsd/scriptagent_grid:  定制的应用服务的日志。

CRS troubleshooting 两例

September 11, 2012 oracle, RAC No comments

一套10.2.0.5 RAC 系统 evmd cssd服务无法启动

CRS version

[oracle@ptdb02 oracle]$ crsctl query crs softwareversion
CRS software version on node [ptdb02] is [10.2.0.5.0]

h1:3:respawn:/sbin/init.d/init.evmd run >/dev/null 2>&1

h2:3:respawn:/sbin/init.d/init.cssd fatal >/dev/null 2>&1

[crsd(21292)]CRS-1012:The OCR service started on node ptdb01.
2012-09-08 18:48:24.980
[evmd(21873)]CRS-1401:EVMD started on node ptdb01.
2012-09-08 18:48:25.271
[crsd(22023)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870592 to 169870592. Details in /u01/app/oracle/product/10.2.0/crs_1/log/ptdb01/crsd/crsd.log.
2012-09-08 18:48:25.271
[crsd(22023)]CRS-1012:The OCR service started on node ptdb01.
2012-09-08 18:48:26.679
[evmd(22605)]CRS-1401:EVMD started on node ptdb01.

CSSD Reconfiguration一直没有成功 active node 为ptdb01 evmd cssd 进程无法启动 –> check evmd log

Oracle Database 10g CRS Release 10.2.0.5.0 Production Copyright 1996, 2007, Oracle.  All rights reserved
2012-09-08 18:59:03.765: [    EVMD][999623216]0Initializing OCR
2012-09-08 18:59:03.773: [    EVMD][999623216]0Active Version from OCR:10.2.0.5.0
2012-09-08 18:59:03.773: [    EVMD][999623216]0Active Version and Software Version are same
2012-09-08 18:59:03.773: [    EVMD][999623216]0Initializing Diagnostics Settings
2012-09-08 18:59:03.773: [    EVMD][999623216]0ENV Logging level for Module: allcomp  0
2012-09-08 18:59:03.773: [    EVMD][999623216]0ENV Logging level for Module: default  0
2012-09-08 18:59:03.773: [    EVMD][999623216]0ENV Logging level for Module: COMMCRS  0
2012-09-08 18:59:03.773: [    EVMD][999623216]0ENV Logging level for Module: COMMNS  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMD  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMDMAIN  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMCOMM  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMEVT  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMAPP  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: EVMAGENT  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: CRSOCR  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: CLUCLS  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: OCRRAW  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: OCROSD  0
2012-09-08 18:59:03.774: [    EVMD][999623216]0ENV Logging level for Module: OCRAPI  0
2012-09-08 18:59:03.775: [    EVMD][999623216]0ENV Logging level for Module: OCRUTL  0
2012-09-08 18:59:03.775: [    EVMD][999623216]0ENV Logging level for Module: OCRMSG  0
2012-09-08 18:59:03.775: [    EVMD][999623216]0ENV Logging level for Module: OCRCLI  0
2012-09-08 18:59:03.775: [    EVMD][999623216]0ENV Logging level for Module: CSSCLNT  0
[  clsdmt][1108588864]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=ptdb01DBG_EVMD))
2012-09-08 18:59:03.777: [    EVMD][999623216]0Creating pidfile  /u01/app/oracle/product/10.2.0/crs_1/evm/init/ptdb01.pid
2012-09-08 18:59:03.781: [    EVMD][999623216]0Authorization database built successfully.
2012-09-08 18:59:04.209: [  OCRAPI][999623216]procr_open: Node Failure. Attempting retry #0

..
2012-09-08 18:59:04.210: [  OCRCLI][999623216]oac_reconnect_server: Could not connect to server. clsc ret 9
2012-09-08 18:59:19.177: [  OCRCLI][999623216]oac_reconnect_server: Could not connect to server. clsc ret 9
2012-09-08 18:59:19.227: [  OCRAPI][999623216]procr_open: Node Failure. Attempting retry #298
2012-09-08 18:59:19.228: [  OCRCLI][999623216]oac_reconnect_server: Could not connect to server. clsc ret 9
2012-09-08 18:59:19.278: [  OCRAPI][999623216]procr_open: Node Failure. Attempting retry #299
...
2012-09-08 18:59:19.635: [  OCRCLI][999623216]oac_reconnect_server: Could not connect to server. clsc ret 9
2012-09-08 18:59:19.636: [  EVMAPP][999623216][PANIC]0Unable to open local accept socket - errno 13
2012-09-08 18:59:19.636: [    EVMD][999623216][PANIC]0EVMD exiting
2012-09-08 18:59:19.636: [    EVMD][999623216]0Done.

OCR 无法 Initial retry 299次之后失败 导致evmd进程无法启动 这点与11gr2有了区别 11r2之后 crs通过olr(Oracle Local Repository)获取asm information 从而online ocr 可以参考这篇文章

我们通过/etc/orc.loc 获取ocr信息:

[oracle@ptdb01 oracle]$ cat ocr.loc
ocrconfig_loc=/u01/app/oracle/product/10.2.0/db_1/cdata/localhost/local.ocr
local_only=TRUE
[oracle@ptdb01 oracle]$ strings  /u01/app/oracle/product/10.2.0/db_1/cdata/localhost/local.ocr
root
root
SYSTEM
DATABASE
local_only
ORA_CRS_HOME
versionstring
version
language
AMERICAN_AMERICA.WE8ISO8859P1
activeversion
 node_numbers
10G Release 2
/u01/app/oracle/product/10.2.0/db_1
true
node0
10.2.0.5.0
hostnames
privatenames
node_numbers
node_names
configured_node_map
clustername
localhost
ptdb01
nodenum
node0
nsendpoint
hostname
privatename
nodename
ptdb01
127.0.0.1
nodenum
ptdb01
nodenum
ptdb01
(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=0))
10.2.0.5.0

local_only=TRUE 这显然是一个local的文件 指向 “/u01/app/oracle/product/10.2.0/db_1/cdata/localhost/local.ocr”这个文件 但是local.ocr并不包含任何ocr location的信息 导致无法初始化ocr 从而导致evmd进程无法online 手动设置ocrconfig_loc=/dev/raw/raw2 启动crs

2012-09-08 19:04:20.912: [    EVMD][934681136]0Authorization database built successfully.
2012-09-08 19:04:21.270: [  EVMEVT][934681136][ENTER]0EVM Listening on: 54720654
2012-09-08 19:04:21.274: [  EVMAPP][934681136]0EVMD Started
2012-09-08 19:04:21.277: [  EVMEVT][1199147328]0Listening at (ADDRESS=(PROTOCOL=tcp)(HOST=ptdb01-priv)(PORT=0)) for P2P evmd connections requests
2012-09-08 19:04:21.281: [    EVMD][934681136]0Authorization database built successfully.
2012-09-08 19:04:21.395: [  EVMEVT][1230616896][ENTER]0Establishing P2P connection with node: ptdb02
2012-09-08 19:04:21.397: [  EVMEVT][1241106752]0Private Member Update event for ptdb01 received by clssgsgrpstat

2012-09-08 19:00:07.083
[cssd(13795)]CRS-1601:CSSD Reconfiguration complete. Active nodes are ptdb01 ptdb02 .

ptdb01 ptdb02 均被active cssd启动成功, 事后解到工程师安装ASM的时候曾经错选为单实例然后简单的删除了这个实例,一些后续的清理工作并没有完成。

———————————————————-

DBCA建库报错 :

CRS version:

[grid@db-42 bin]$ crsctl query crs softwareversion
Oracle Clusterware version on node [db-42] is [11.2.0.3.0]

[grid@db-42 bin]$ id oracle
uid=502(oracle) gid=501(oinstall) groups=501(oinstall),502(dba),506(asmdba)
[grid@db-42 bin]$ crs_getperm ora.ARCH.dg
Name: ora.DATA.dg
owner:grid:rwx,pgrp:oinstall:rwx,other::r–
[grid@db-42 bin]$ crs_getperm ora.DATA.dg
Name: ora.DATA.dg
owner:grid:rwx,pgrp:asmadmin:rwx,other::r–

两种方法

1 把oracle加入asmadmin group

2 修改ora.DATA.dg

[grid@db-42 bin]$ crs_setperm ora.DATA.dg -u user:oracle:rwx

[grid@db-42 bin]$ crs_getperm ora.DATA.dg
Name: ora.DATA.dg
owner:grid:rwx,pgrp:asmadmin:rwx,other::r–,user:oracle:rwx

11gR2 RAC listener

August 18, 2012 network, oracle No comments

oracle 11gR2 rac listener引入了3项主要的改变:

1. listener默认owner为grid

[oracle@db-42 ~]$ srvctl config listener
Name: LISTENER
Network: 1, Owner: grid
Home:
End points: TCP:1521

2. listener 被oracle agent 所管理 (oraagent)
[grid@db-41 admin]$ ps -ef |grep oraagent
grid 22195 18519 0 16:42 pts/0 00:00:00 grep oraagent
grid 25674 1 0 Jul17 ? 02:51:35 /data/11.2.0/grid/bin/oraagent.bin
grid 27720 1 0 Jul17 ? 00:47:49 /data/11.2.0/grid/bin/oraagent.bin
oracle 32300 1 0 Aug02 ? 01:13:35 /data/11.2.0/grid/bin/oraagent.bin

[grid@db-41 admin]$ cat endpoints_listener.ora
LISTENER_DB-41=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=db-41-vip)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.41)(PORT=1521)(IP=FIRST)))) # line added by Agent

3. SCAN listener被引入

关于 scan listener以下图为例

oracle 使用3个scan-ip来实现scans的HA与loadbalance架构,这里需要重点提一下

1:当scan-ip少于节点总数时候 对应的scan listener 只存在于对应节点,也就是说不可能出现one node more scans的情况

举个例子:

[grid@db-42 admin]$ crsctl stat res -t |grep -i listener
ora.LISTENER.lsnr
ora.LISTENER_SCAN1.lsnr

[grid@db-42 admin]$ crsctl stat res ora.LISTENER_SCAN1.lsnr
NAME=ora.LISTENER_SCAN1.lsnr
TYPE=ora.scan_listener.type
TARGET=ONLINE
STATE=ONLINE on db-42
TARGET=ONLINE
STATE=ONLINE on dm01db02

一个4节点rac只存在一个scan listener 此时的scan-listener存活于node2上面,因为只有一个scan-ip 也就失去了HA的功效

2:对于scan-listener多于节点数的情况,必定会出现一个节点多个scan bundle的情况。

[grid@db-42 admin]$ srvctl config scan
SCAN name: scan-ip, Network: 1/10.0.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /scan-ip/10.0.0.145

可以看到对于现在的情况 只能存在one bye one的情况,对于使用多个SCAN实现HA与LB 可以通过DNS实现scan-ip动态分配。其实在exadata的默认设置中
也是使用了3个scan-ip 但是由于客户的某种问题,往往也是使用第一种情况。

[grid@dm01db02 ~]$ cat /etc/hosts |grep scan
# you will have to decide on which scan address to uncomment – choose only 1
10.0.1.205 dm01-scan.yihaodian.com dm01-scan
#10.0.1.206 dm01-scan.yihaodian.com dm01-scan
#10.0.1.207 dm01-scan.yihaodian.com dm01-scan

可以看到默认是3个scan-ip 但是由于我们DNS服务种种限制,没有使用HA架构。其实一直觉得SCAN这个东西很鸡肋,一套成熟的RAC系统
必须通过精准的应用切割实现,这也就是为什么很多RAC使用了人为控制,不让也不允许系统的自动负载均衡。不过对于exadata来说,由于
使用了RDS传输协议,使得这种影响降低了很多。

SCAN listener的连接过程如下:

1. app 通过 scan-ip 发起一个connection请求。
2. scan listener 接受并传递这个请求给 local listener.在这个过程中 scan-listener会通过负载均衡算法选择一个负载较低的local listener 传递
3. local listener 接受这个请求 从而创建了一个DB connection.

另外关于朋友提到的 Endpoints_listener 问题:

1. Endpoints_listener.ora file is there for backward compatibility with pre-11.2 databases.

2. DBCA needs to know the endpoints location to configure database parameters and tnsnames.ora file.

3. It used to use the listener.ora file, 11.2 RAC listener.ora by default only has IPC entries.

4. should not be used LSNRCTL management LISTENER, and need to use SRVCTL or CRSCTL tool management, for LSNRCTL will not recognize endpoints_listener. The information ora, cause monitoring not necessary to address, work on the port 

11gR2 RAC Rebootless Node Fencing

March 6, 2012 Internals, oracle, RAC No comments

Rebootless Node Fencing

In versions before 11.2.0.2 Oracle Clusterware tried to prevent a split-brain with a fast reboot (better: reset) of the server(s) without waiting for ongoing I/O operations or synchronization of the file systems. This mechanism has been changed in version 11.2.0.2 (first 11g Release 2 patch set). After deciding which node to evict, the clusterware:


. attempts to shut down all Oracle resources/processes on the server (especially processes generating I/Os)

. will stop itself on the node

. afterwards Oracle High Availability Service Daemon (OHASD) will try to start the Cluster Ready Services (CRS) stack again. Once the cluster interconnect is back online,all relevant cluster resources on that node will automatically start

. kill the node if stop of resources or processes generating I/O is not possible (hanging in kernel mode, I/O path, etc.)

This behavior change is particularly useful for non-cluster aware applications.

[cssd(3713)]CRS-1610:Network communication with node rac1 (1) missing for 90% of timeout interval.
Removal of this node from cluster in 2.190 seconds

[cssd(3713)]CRS-1652:Starting clean up of CRSD resources.

[cssd(3713)]CRS-1654:Clean up of CRSD resources finished successfully.
[cssd(3713)]CRS-1655:CSSD on node rac2 detected a problem and started to shutdown.

[cssd(5912)]CRS-1713:CSSD daemon is started in clustered mode

与11gR2之前的机制不一样,Oracle不再直接kill掉这个node 而是采取了kill 相关的process 如果尝试kill失败 则会去kill node,这种机制对于big cluster 是一种很好的保护,避免了node reboot 之后 resources re-mastered 导致的资源冻结,下面这段话比较详细的说明了这个观点:


Prior to 11g R2, during voting disk failures the node will be rebooted to protect the integrity of the cluster. But rebooting cannot be necessarily just the communication issue. The node can be hanging or the IO operation can be hanging so potentially the reboot decision can be the incorrect one. So Oracle Clusterware will fence the node without rebooting. This is a big (and big) achievement and changes in the way the cluster is designed.

The reason why we will have to avoid the reboot is that during reboots resources need to re-mastered and the nodes remaining on the cluster should be re-formed. In a big cluster with many numbers of nodes, this can be potentially a very expensive operation so Oracle fences the node by killing the offending process so the cluster will shutdown but the node will not be shutdown. Once the IO path is available or the network heartbeat is available, the cluster will be started again. Be assured the data will be protected but it will be done without any pain rebooting the nodes. But in the cases where the reboot is needed to protect the integrity, the cluster will decide to reboot the node.

reference from :RAC_System_Test_Plan_Outline_11gr2_v2_0

Recommendation for the Real Application Cluster Interconnect and Jumbo Frames

February 9, 2012 Architect, oracle, RAC, software 2 comments

Recommendation for the Real Application Cluster Interconnect and Jumbo Frames

Applies to:


Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 11.2.0.0 – Release: 9.2 to 11.2
Information in this document applies to any platform.
Oracle Server Enterprise Edition – Version: 9.2.0.1 to 11.2
Purpose

This note covers the current recommendation for the Real Application Cluster Interconnect and Jumbo Frames
Scope and Application

This article points out the issues surrounding Ethernet Jumbo Frame usage for the Oracle Real Application Cluster (RAC) Interconnect. In Oracle Real Application Clusters, the Cluster Interconnect is designed to run on a dedicated, or stand-alone network. The Interconnect is designed to carry the communication between the nodes in the Cluster needed to check for the Clusters condition and to synchronize the various memory caches used by the database.

Ethernet is a widely used networking technology for Cluster Interconnects. Ethernet’s variable frame size of 46-1500 bytes is the transfer unit between the all Ethernet participants, such as the hosts and switches. The upper bound, in this case 1500, is called MTU (Maximum Transmission Unit). When an application sends a message greater than 1500 bytes (MTU), it is fragmented into 1500 byte, or smaller, frames from one end-point to another. In Oracle RAC, the setting of DB_BLOCK_SIZE multiplied by the MULTI_BLOCK_READ_COUNT determines the maximum size of a message for the Global Cache and the PARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query. These message sizes can range from 2K to 64K or more, and hence will get fragmented more so with a lower/default MTU.

Jumbo Frames introduces the ability for an Ethernet frame to exceed its IEEE 802 specified Maximum Transfer Unit of 1500 bytes up to a maximum of 9000 bytes. Even though Jumbo Frames is widely available in most NICs and data-center class managed switches it is not an IEEE approved standard. While the benefits are clear, Jumbo Frames interoperability is not guaranteed with some existing networking devices. Though Jumbo Frames can be implemented for private Cluster Interconnects, it requires very careful configuration and testing to realize its benefits. In many cases, failures or inconsistencies can occur due to incorrect setup, bugs in the driver or switch software, which can result in sub-optimal performance and network errors.
Recommendation for the Real Application Cluster Interconnect and Jumbo Frames

Configuration

In order to make Jumbo Frames work properly for a Cluster Interconnect network, careful configuration in the host, its Network Interface Card and switch level is required:
The host’s network adapter must be configured with a persistent MTU size of 9000 (which will survive reboots).
For example, ifconfig -mtu 9000 followed by ifconfig -a to show the setting completed.
Certain NIC’s require additional hardware configuration.
For example, some Intel NIC’s require special descriptors and buffers to be configured for Jumbo Frames to work properly.
The LAN switches must also be properly configured to increase the MTU for Jumbo Frame support. Ensure the changes made are permanent (survives a power cycle) and that both “Jumbo” refer to same size, recommended 9000 (some switches do not support this size).

Because of the lack of standards with Jumbo Frames the interoperability between switches can be problematic and requires advanced networking skills to troubleshoot.
Remember that the smallest MTU used by any device in a given network path determines the maximum MTU (the MTU ceiling) for all traffic travelling along that path.
Failing to properly set these parameters in all nodes of the Cluster and Switches can result in unpredictable errors as well as a degradation in performance.
Testing

Request your network and system administrator along with vendors to fully test the configuration using standard tools such as SPRAY or NETCAT and show that there is an improvement not degradation when using Jumbo Frames. Other basic ways to check it’s configured correctly on Linux/Unix are using:

Traceroute: Notice the 9000 packet goes through with no error, while the 9001 fails, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:

[node01] $ traceroute -F node02-priv 9000
traceroute to node02-priv (10.10.10.2), 30 hops max, 9000 byte packets
1 node02-priv (10.10.10.2) 0.232 ms 0.176 ms 0.160 ms

[node01] $ traceroute -F node02-priv 9001
traceroute to node02-priv (10.10.10.2), 30 hops max, 9001 byte packets
traceroute: sendto: Message too long
1 traceroute: wrote node02-priv 9001 chars, ret=-1
* Note: Due to Oracle Bugzilla 7182 (must have logon privileges) — also known as RedHat Bugzilla 464044 — older than EL4.7 traceroute may not work correctly for this purpose.
* Note: Some versions of tracroute, e.g. traceroute 2.0.1 shipped with EL5, add the header size on top of what is specified when using the -F flag (same as ping behavior below). Newer versions of traceroute, like 2.0.14 (shipped with OL6) have the old behavior of traceroute version 1 (size of packet is exactly as what is specified with the -F flag).

Ping: With ping we have to take into account an overhead of about 28 bytes per packet, so 8972 bytes go through with no errors, while 8973 fail, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:
[node01]$ ping -c 2 -M do -s 8972 node02-priv
PING node02-priv (10.10.10.2) 1472(1500) bytes of data.
1480 bytes from node02-priv (10.10.10.2): icmp_seq=0 ttl=64 time=0.220 ms
1480 bytes from node02-priv (10.10.10.2): icmp_seq=1 ttl=64 time=0.197 ms

[node01]$ ping -c 2 -M do -s 8973 node02-priv
From node02-priv (10.10.10.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
From node02-priv (10.10.10.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
— node02-priv ping statistics —
0 packets transmitted, 0 received, +2 errors
* Note: Ping reports fragmentation errors, due to exceeding the MTU size.
Performance

For RAC Interconnect traffic, devices correctly configured for Jumbo Frame improves performance by reducing the TCP, UDP, and Ethernet overhead that occurs when large messages have to be broken up into the smaller frames of standard Ethernet. Because one larger packet can be sent, inter-packet latency between various smaller packets is eliminated. The increase in performance is most noticeable in scenarios requiring high throughput and bandwidth and when systems are CPU bound.

When using Jumbo Frames, fewer buffer transfers are required which is part of the reduction for fragmentation and reassembly in the IP stack, and thus has an impact in reducing the latency of a an Oracle block transfer.

As illustrated in the configuration section, any incorrect setup may prevent instances from starting up or can have a very negative effect on the performance.
Known Bugs

In some versions of Linux there are specific bugs in Intel’s Ethernet drivers and the UDP code path in conjunction with Jumbo Frames that could affect the performance. Check for and use the latest version of these drivers to be sure you are not running into these older bugs.

The following bugzilla bugs 162197, 125122 are limited to RHEL3.
Recommendation

There is some complexity involved in configuring Jumbo Frames, which is highly hardware and OS specific. The lack of a specific standard may present OS and hardware bugs. Even with these considerations, Oracle recommends using Jumbo Frames for private Cluster Interconnects.

Since there is no official standard for Jumbo Frames, this configuration should be properly load tested by Customers. Any indication of packet loss, socket buffer or DMA overflows, TX and RX error in adapters should be noted and checked with the hardware and operating system vendors.

The recommendation in this Note is strictly for Oracle private interconnect only, it does not applies to other NAS or iSCSI vendor tested and validated Jumbo Frames configured networks.

Oracle VM does not support Jumbo Frame. Refer Oracle VM: Jumbo Frame on Oracle VM (Doc ID 1166925.1) for further information.

Configuring Temporary Tablespaces for RAC Databases for Optimal Performance

January 6, 2012 oracle, RAC No comments

Configuring Temporary Tablespaces for RAC Databases for Optimal Performance
Modified 04-AUG-2010 Type BULLETIN Status ARCHIVED
In this Document
Purpose
Scope and Application
Configuring Temporary Tablespaces for RAC Databases for Optimal Performance
References

Applies to:

Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 11.1.0.6 – Release: 9.2 to 11.1
Information in this document applies to any platform.
***Checked for relevance on 04-Aug-2010***
Purpose

This document discusses issues involved in configuring temporary tablespaces for RAC databases. Also provides best practice recommendations in configuring them for optimal performance.
Scope and Application

Oracle RAC DBAs.
Configuring Temporary Tablespaces for RAC Databases for Optimal Performance

Any DW, OLTP or mixed workload application that uses a lot of temp space for temporary tables, sort segments etc, when running low on temp space, lots of sessions would start waiting on ‘SS enqueue’ and ‘DFS lock handle’ waits. This would cause some severe performance issues. This best practice note for temporary tablespace, explains how this works in RAC environment and offer recommendations.
Space allocated to one instance is managed in the SGA of that instance, and it is not visible to other instances.

Instances do not normally return temp space to the ‘common pool’.

If all the temp space is allocated to instances, and there is no more temp space within an instance, user requests for temp space will cause a request for temp space to be sent to the other instances. The session requesting the space will get the ‘SS enqueue’ for the temporary tablespace and issue a cross instance call (using a CI enqueue) to the other instances (waiting for ‘DFS lock handle’). All inter instance temp space requests will serialize on this ‘CI enqueue, and this can be very expensive.

A heavy query executing in one instance and using lots of temp space might cause all or most of the temp space to be allocated to this instance. This kind of imbalance will lead to increased contention for temp space.

As users on each instance request temp space, space will be allocated to the various instances. During this phase it is possible to get contention on the file space header blocks, and it is recommended to have at least as many temp files as there are instances in the RAC cluster. This normally shows up as ‘buffer busy’ waits and it is different from the ‘SS enqueue’/’DFS lock handle’ wait issue.

Temporary tablespace groups are designed to accommodate very large temp space requirements, beyond the current limits for a single temporary tablespace: 8TB (2k block size) to 128TB (32k block size).

One possible advantage of temporary tablespace groups is that it provides multiple SS enqueues (one per tablespace), but this only shifts the contention to the CI enqueue (only one system wide)

It is easier to share space within a single temporary tablespace, rather than within a temporary tablespace group. If a session starts allocating temp space from a temporary tablespace within a temporary tablespace group, additional space cannot be allocated from another temporary tablespace within the group. With a single temporary tablespace, a session can allocate space across tempfiles.

The following is the recommended best practices for managing temporary tablespace in a RAC environment:
Make sure enough temp space is configured. Due to the way temp space is managed by instance in RAC, it might be useful to allocate a bit extra space compared to similar single instance database.

Isolate heavy or variable temp space users to separate temporary tablespaces. Separating reporting users from OLTP users might be one option.

Monitor the temp space allocation to make sure each instance has enough temp space available and that the temp space is allocated evenly among the instances. The following SQL is used:


select inst_id, tablespace_name, segment_file, total_blocks,
used_blocks, free_blocks, max_used_blocks, max_sort_blocks
from gv$sort_segment;

select inst_id, tablespace_name, blocks_cached, blocks_used
from gv$temp_extent_pool;

select inst_id,tablespace_name, blocks_used, blocks_free
from gv$temp_space_header;

select inst_id,free_requests,freed_extents
from gv$sort_segment;

If temp space allocation between instances has become imbalanced, it might be necessary to manually drop temporary segments from an instance. The following command is used for this:

alter session set events ‘immediate trace name drop_segments level ‘;

See Bug 4882834 for details.

For each temporary tablespace, allocate at least as many temp files as there are instances in the cluster.
References

BUG:4882834 – EXCESSIVE SS AND TS CONTENTION ON NEW 10G CLUSTERS
BUG:6331283 – LONG WAITS ON ‘DFS LOCK HANDLE’

关于redhat 5.5安装10g Rac的问题总结

December 11, 2011 oracle, RAC No comments

环境:redhat 5.5+oracle 10g rac+netapp

首先碰到DMM问题,netapp存储通过DMM软件mapping过来的设备 oracle无法识别 报如下错误:

the location /dev/mapper/ocr1 ,entered for the oracle cluster registry(OCR) is not shared across all the nodes in the cluster…

起初怀疑是device 无法共享, 分别在两个node进程dd if-of操作 发现导入内容在另一个节点可以识别,确保磁盘是共享的,查看mulitpatch毫无问题。

查看metalink发现问题;Configuring raw devices (multipath) for Oracle Clusterware 10g Release 2 (10.2.0) on RHEL5/OEL5 [ID 564580.1]

During the installation of Oracle Clusterware 10g Release 2 (10.2.0), the Universal Installer (OUI) is unable to verify the sharedness of block devices, therefore requires the use of raw devices (whether to singlepath or multipath devices) to be specified for OCR and voting disks. As mentioned earlier, this is no longer the case from Oracle11g R1 (11.1.0) that can use multipathed block devices directly

也就是说10g中的ocr,vote必须放在raw设备文件上,11g开始支持DMM mapping出来的文件

启动raw device –> mapping 到 DMM设备

# raw /dev/raw/raw1 /dev/mapper/ocr1
/dev/raw/raw1: bound to major 253, minor 11
# raw /dev/raw/raw2 /dev/mapper/ocr2
/dev/raw/raw2: bound to major 253, minor 8
# raw /dev/raw/raw3 /dev/mapper/ocr3
/dev/raw/raw3: bound to major 253, minor 10
# raw /dev/raw/raw4 /dev/mapper/voting1
/dev/raw/raw4: bound to major 253, minor 5
# raw /dev/raw/raw5 /dev/mapper/voting2
/dev/raw/raw5: bound to major 253, minor 4
# raw /dev/raw/raw6 /dev/mapper/voting3
/dev/raw/raw6: bound to major 253, minor 7

Redhat 5.5版本重新启用了rawdevice service 可以直接使用以上命令

安装继续 到了最后一步 要执行 crs root.sh脚本时 报错:

PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration

./clsfmt ocr /dev/raw/raw1
clsfmt: Received unexpected error 4 from skgfifi
skgfifi: Additional information: -2
Additional information: 1000718336

Changes

It has been found that the following changes can cause this problem to occur:

1. Use Mutiple Path (MP) disk configuration, may hit this issue.
2. Use EMC device (powerpath**) may hit this issue.

But it was not confirmed that these are the only things that can cause this problem to occur, as it has been found that on other hardware and configuration the problem might occur, the key change in this issue is that if the disk size presented from the storage to the cluster nodes are not dividable by 4K the problem should occur.

我们正是使用了DMM软件 触发了此bug,将Patch:4679769 打上 继续到node2节点执行vipca时报错:

[root@db-35 bin]# ./vipca
/home/oracle/product/10.2/crs/jdk/jre//bin/java: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

解决方法

两个节点执行

修改vipca 脚本

vi vipca

Linux) LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/srvm/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

#Remove this workaround when the bug 3937317 is fixed
arch=`uname -m`
if [ “$arch” = “i686” -o “$arch” = “ia64” -o “$arch” = “x86_64” ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
#End workaround
add -> unset LD_ASSUME_KERNEL

在srvctl 中 同样修改

LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL

add ->unset LD_ASSUME_KERNEL

至此crs 安装完毕,安装database software 下载p8202632_10205_Linux-x86-64.zip 将crs和dbsoft 升级至10.2.0.5 DBCA 建库完成

修改Diagwait:

Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions [ID 559365.1]


Symptoms

Oracle Clusterware evicts the node from the cluster when

Node is not pinging via the network heartbeat
Node is not pinging the Voting disk
Node is hung/busy and is unable to perform either of the earlier tasks
In Most cases when the node is evicted, there is information written to the logs to analyze the cause of the node eviction. However in certain cases this may be missing, the steps documented in this note are to be used for those cases where there is not enough information or no information to diagnose the cause of the eviction for Clusterware versions less than 11gR2 (11.2.0.1).

Starting with 11.2.0.1, Customers do not need to set diagwait as the architecture has been changed.

Changes

None

Cause

When the node is evicted and the node is extremely busy in terms of CPU (or lack of it) it is possible that the OS did not get time to flush the logs/traces to the file system. It may be useful to set diagwait attribute to delay the node reboot to give additional time to the OS to write the traces. This setting will provide more time for diagnostic data to be collected by safely and will NOT increase probability of corruption. After setting diagwait, the Clusterware will wait an additional 10 seconds (Diagwait – reboottime). Customers can unset diagwait by following the steps documented below after fixing their OS scheduling issues.

* — Diagwait can be set on windows but it does not change the behaviour as it does on Unix-Linux platforms

@ For internal Support Staff
Diagwait attribute was introduced in 10.2.0.3 and is included in 10.2.0.4 & 11.1.0.6 and higher releases. It has also been subsequently backported to 10.1.0.5 on most platforms. This means it is possible to set diagwait on 10.1.0.5 (or higher), 10.2.0.3 (or higher) and in 11.1.0.6 (or higher). If the command crsctl set/get css diagwait reports “unrecognized parameter diagwait specified” then it can be safely assumed that the Clusterware version does not the necessary fixes to implement diagwait. If that is the case then customer is adviced to apply the latest patchset available before attempting to set diagwait
Solution

It is important that the clusterware stack must be down on all the nodes when changing diagwait .The following steps provides the step-by-step instructions on setting diagwait.

Execute as root
#crsctl stop crs
#/bin/oprocd stop
Ensure that Clusterware stack is down on all nodes by executing
#ps -ef |egrep “crsd.bin|ocssd.bin|evmd.bin|oprocd”
This should return no processes. If there are clusterware processes running and you proceed to the next step, you will corrupt your OCR. Do not continue until the clusterware processes are down on all the nodes of the cluster.
From one node of the cluster, change the value of the “diagwait” parameter to 13 seconds by issuing the command as root:
#crsctl set css diagwait 13 -force
Check if diagwait is set successfully by executing. the following command. The command should return 13. If diagwait is not set, the following message will be returned “Configuration parameter diagwait is not defined”
#crsctl get css diagwait
Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
Validate that the node is running by executing:
#crsctl check crs
Unsetting/Removing diagwait

Customers should not unset diagwait without fixing the OS scheduling issues as that can lead to node evictions via reboot. Diagwait delays the node eviction (and reconfiguration) by diagwait (13) seconds and as such setting diagwait does not affect most customers.In case there is a need to remove diagwait, the above mentioned steps need to be followed except step 3 needs to be replaced by the following command
#crsctl unset css diagwait -f

(Note: the -f option must be used when unsetting diagwait since CRS will be down when doing so)

至此 这次rac的安装顺利完成

[转] cluvfy使用说明

December 9, 2011 oracle, RAC No comments

在安装rac之前,可以通过Oracle提供的一个工具进行cluster环境验证,这个工具叫做cluvfy(Cluster Verification Utility)。

一、CVU程序的位置

OTN下载: http://otn.oracle.com/RAC

Oracle DVD

–clusterware/cluvfy/runcluvfy.sh

–clusterware/rpm/cvuqdisk-1.0.1-1.rpm (linuxonly)

CRS Home (如果已安装CRS软件)

–/bin/cluvfy

–/cv/rpm/cvuqdisk-1.0.1-1.rpm (linuxonly)

Oracle Home (如果已安装RAC数据库软件)

–$ORACLE_HOME/bin/cluvfy

二、CVU特点

1、CVU使用户可以在硬件的设置、clusterware的安装、数据库软件的安装、增加节点等各个阶段来检查cluster.

2、具有可扩展的结构(平台独立,包含对存储和网络设备等的检查)

3、CVU是没有损害的检查

4、命令行方式

5、即使检查失败也不会做任何修正

6、CVU不涉及任何性能调整和监控

7、不做任何cluster或RAC方面的操作(例如,不会去起动或关闭CRS)

8、不去检查cluster database或cluster的内在关系(如CRS与database的关系等)

三、CVU命令用法

[oracle@zhh1 ~]$ cluvfy

USAGE:

cluvfy [ -help ]

cluvfy stage { -list | -help }

cluvfy stage {-pre|-post}[-verbose]

cluvfy comp { -list | -help }

cluvfy comp[-verbose]

[oracle@zhh1 ~]$ which cluvfy

/usr/oracle/product/10.2.0/db_1/bin/cluvfy

cluvfy只安装在本地节点,会在运行时根据需要自动部署到其它节点上。

主要用法:

阶段检查(Stage Verification)
10g RAC的部署(安装)可以逻辑性的分为几个部分,每个部分就叫做一个‘阶段’。

例如:硬件和操作系统的建立、CFS的建立、CRS软件的安装、数据库软件的安装、数据库的配置等

每个阶段又包含一系列的操作并且分为安装前和安装后检查等两种方式.

部件检查(Component Verification)
检查cluster部件的可用性、完整性、健康性

例如:共享存储设备的可用性、空间可用性、节点通达性、CFS的完整性、Cluster的完整性、CRS的完整性等

1、stage

[oracle@zhh1 ~]$ cluvfy stage -list

USAGE:

cluvfy stage {-pre|-post}[-verbose]

Valid stage options and stage names are:

-post hwos : post-check for hardware and operating system

-pre cfs : pre-check for CFS setup

-post cfs : post-check for CFS setup

-pre crsinst : pre-check for CRS installation

-post crsinst : post-check for CRS installation

-pre dbinst : pre-check for database installation

-pre dbcfg : pre-check for database configuration

有效的选项和阶段名称如下:

-post hwos:硬件和操作系统安装后的检查

-pre cfs: CFS建立之前的检查

-post cfs: CFS建立之后的检查

-pre crsinst: CRS安装之前的检查

-post crsinst: CRS安装之后的检查

-pre dbinst:数据库软件安装之前的检查

-pre dbcfg:数据库配置之前的检查

2、component

[oracle@zhh1 ~]$ cluvfy comp -list

USAGE:

cluvfy comp[-verbose]

Valid components are:

nodereach : checks reachability between nodes

nodecon : checks node connectivity

cfs : checks CFS integrity

ssa : checks shared storage accessibility

space : checks space availability

sys : checks minimum system requirements

clu : checks cluster integrity

clumgr : checks cluster manager integrity

ocr : checks OCR integrity

crs : checks CRS integrity

nodeapp : checks node applications existence

admprv : checks administrative privileges

peer : compares properties with peers

有效的部件名称如下:

nodereach:检查节点之间的通达性

nodecon:检查节点之间的的连通性

cfs:检查CFS的完整性

ssa:检查共享存储的可访问性

space :检查空间的可用性

sys :检查最小的系统需求

clu:检查cluster的完整性

clumgr:检查cluster manager的完整性

ocr:检查OCR的完整性

crs:检查CRS的完整性

nodeapp:检查nodeapp是否存在

admprv:检查管理权限

peer :比较同等属性(即节点差异)

3、使用stage还是component
a、阶段模式-在安装CRS和RAC期间使用CVU阶段模式
在阶段模式中正确使用安装前和安装后检查方式
例如:
安装CRS软件之前:
运行DVD上或从OTN下载的CVU命令runcluvfy.sh
$cluvfy stage -pre crsinst -n zhh1,zhh2 -verbose | tee /tmp/cluvfy_preinst.log
安装RAC软件之前:
运行安装在CRS Home的bin目录下的cluvfy
$ cluvfy stage -pre dbinst -n zhh1,zhh2 -verbose | tee /tmp/cluvfy_dbinst.log

b、部件模式-在CRS或数据库运行过程中如果需要检查某个特殊部件或为了隔离一个cluster子系统做诊断时,使用适当的部件模式.
例如:
诊断网络问题
$ cluvfy comp nodecon -n zhh1,zhh2 -verbose
将检查network interface(eth0,eth1,etc)、public IP、private IP等
诊断共享磁盘问题
$ cluvfy comp ssa -n zhh1,zhh2 -s /dev/sdb1 -verbose
将检查所列的存储设备是否在所有节点间共享

[oracle@zhh1 ~]$ cd /usr/oracle/soft/clusterware/
cluvfy/ install/ rootpre/ runInstaller upgrade/
doc/ response/ rpm/ stage/ welcome.html
[oracle@zhh1 ~]$ cd /usr/oracle/soft/clusterware/cluvfy/
[oracle@zhh1 cluvfy]$ ll
total 30996
-rwxr-xr-x 1 oracle dba 10326112 Oct 23 2005 cvupack.zip
-rwxr-xr-x 1 oracle dba 21356849 Oct 23 2005 jrepack.zip
-rwxr-xr-x 1 oracle dba 3107 Oct 23 2005 runcluvfy.sh
就是这个runcluvfy.sh,主要命令的使用方法都已经列出了。
[oracle@zhh1 cluvfy]$ ./runcluvfy.sh
USAGE:
cluvfy [ -help ]
cluvfy stage { -list | -help }
cluvfy stage {-pre|-post} [-verbose]
cluvfy comp { -list | -help }
cluvfy comp [-verbose]

[oracle@zhh1 ~]$ cluvfy stage -pre crsinst -n zhh1

Performing pre-checks for cluster services setup

Checking node reachability…
Node reachability check passed from node “zhh1”.

Checking user equivalence…
User equivalence check passed for user “oracle”.

Checking administrative privileges…
User existence check passed for “oracle”.
Group existence check failed for “oinstall”.
Check failed on nodes:
zhh1

Administrative privileges check failed.

Checking node connectivity…

Node connectivity check passed for subnet “192.168.5.0” with node(s) zhh1.
Node connectivity check passed for subnet “10.0.0.0” with node(s) zhh1.

Suitable interfaces for the private interconnect on subnet “192.168.5.0”:

zhh1 eth0:192.168.5.235 eth0:192.168.5.233

Suitable interfaces for the private interconnect on subnet “10.0.0.0”:

zhh1 eth1:10.0.0.100

ERROR:
Could not find a suitable set of interfaces for VIPs.

Node connectivity check failed.

Checking system requirements for ‘crs’…
Total memory check passed.
Free disk space check passed.
Swap space check passed.
System architecture check passed.
Kernel version check passed.
Package existence check passed for “binutils-2.15.92.0.2-13”.
Group existence check passed for “dba”.
Group existence check failed for “oinstall”.
Check failed on nodes:
zhh1
User existence check passed for “nobody”.

System requirement failed for ‘crs’

Pre-check for cluster services setup was unsuccessful on all the nodes.
可以看到检查结果,就可以根据这些检查结果来进行有的放矢地修改了。

 

reference: http://hi.baidu.com/dba_hui/blog/item/99248378e3b2e51a29388ace.html