11g

Optimizer in Oracle 11g by wwf.

April 11, 2013 11g, advanced tech, oracle No comments

By wwf from ebay COC about Optimizer in Oracle 11g.

Can not purge obsolete cursor in 11.2.0.3

November 28, 2012 11g, maintain, oracle No comments

11.2.0.3中Oracle引入了_cursor_obsolete_threshold隐藏参数,默认为100,表示一个parent cursor最多可以有100个child cursor;如果SQL需要第101个child cursor,则将该parent cursor and its 100 child cursor设置为obsoleted,并且重新生成新的parent cursor. 但是那些过期的child cursor不会从v$sql中消失,dbms_shared_pool.purge也无法将这些过期的child cursor flush出去,Oracle正在fix这个bug.

eg:

-bash-3.2$ ./opatch lsinventory
Oracle Interim Patch Installer version 11.2.0.3.0
Copyright (c) 2012, Oracle Corporation. All rights reserved.

Oracle Home : /u01/app/oracle/product/11.2.0/db_1
Central Inventory : /u01/app/oraInventory
from : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc
OPatch version : 11.2.0.3.0
OUI version : 11.2.0.3.0
Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2012-11-28_12-40-02PM_1.log

Lsinventory Output file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/lsinv/lsinventory2012-11-28_12-40-02PM.txt

——————————————————————————–
Installed Top-level Products (1):

Oracle Database 11g 11.2.0.3.0
There are 1 products installed in this Oracle Home.

Interim patches (1) :

Patch 13923374 : applied on Thu Sep 20 14:10:21 CST 2012
Unique Patch ID: 14893407
Patch description: “Database Patch Set Update : 11.2.0.3.3 (13923374)”
Created on 14 Jun 2012, 23:16:11 hrs PST8PDT
Sub-patch 13696216; “Database Patch Set Update : 11.2.0.3.2 (13696216)”
Sub-patch 13343438; “Database Patch Set Update : 11.2.0.3.1 (13343438)”

From v$sqlarea only return one row:

SQL> select address,hash_value from v$sqlarea where sql_id=’27svyp3s52cu0′

ADDRESS HASH_VALUE
—————- ———-
0700001DBB270900 4031853376

you can only purge this cursor:

SQL> exec dbms_shared_pool.purge(‘0700001DBB270900,4031853376′,’C’);

PL/SQL procedure successfully completed.

but if you want to purge like this:

SQL> select distinct address,hash_value from v$sql where sql_id=’27svyp3s52cu0′;

ADDRESS HASH_VALUE
—————- ———-
0700001DBE384CD8 4031853376
0700001DBD3119A0 4031853376
0700001DBC063368 4031853376
0700001DBDB39548 4031853376
0700001DBC993898 4031853376
0700001DBC75A2E8 4031853376
0700001DBCC34B20 4031853376
0700001DBB270900 4031853376
0700001DBCFFE730 4031853376
0700001DBCB15A70 4031853376
0700001DBFA741A8 4031853376
0700001DBC4FEA60 4031853376
0700001DBE3F49F8 4031853376
0700001DBB4D1938 4031853376

14 rows selected

SQL> exec dbms_shared_pool.purge(‘0700001DBE384CD8,4031853376′,’C’);

begin dbms_shared_pool.purge(‘0700001DBE384CD8,4031853376′,’C’); end;

ERROR at line 1:
ORA-06570: shared pool object does not exist, cannot be pinned/purged
ORA-06512: at “SYS.DBMS_SHARED_POOL”, line 48
ORA-06512: at “SYS.DBMS_SHARED_POOL”, line 115
ORA-06512: at line 1

SQL> exec dbms_shared_pool.purge(‘0700001DBB4D1938,4031853376′,’C’);
BEGIN dbms_shared_pool.purge(‘0700001DBB4D1938,4031853376′,’C’); END;

*
ERROR at line 1:
ORA-06570: shared pool object does not exist, cannot be pinned/purged
ORA-06512: at “SYS.DBMS_SHARED_POOL”, line 48
ORA-06512: at “SYS.DBMS_SHARED_POOL”, line 115
ORA-06512: at line 1

extended statistics for this cursor:

Shared Cursors Statistics
Total Parses		29,918
Hard Parses		1,665
Child Cursors		1,478
Loaded Plans		1,478
Invalidations		1,473
Largest Cursor Size (KB)		292.43
All Cursor Size (KB)		399,376.98
First Load Time		Nov 23, 2012 4:26:13 AM GMT+08:00
Last Load Time		Nov 28, 2012 1:10:10 PM GMT+08:00
Execution Statistics

Total	Per Execution	Per Row
Executions	1,264,476	1	1.00
Elapsed Time (sec)	18,717.60	0.01	0.01
CPU Time (sec)	1,087.67	<0.01	<0.01
Buffer Gets	114,482,137	90.54	90.54
Disk Reads	2,241,885	1.77	1.77
Direct Writes	0	0.00	0.00
Rows	1,264,476	1.00	1
Fetches	0	0.00	0.00



Other Statistics
Executions that Fetched all Rows (%)	100.00
Average Persistent Mem (KB)	45.90
Average Runtime Mem (KB)	41.75
Serializable Aborts	0
Remote	No
Obsolete Yes
Child Latch Number 0

针对这个问题要说明一下,这是一条简单的insert语句 但是bind value达到了150+ 对于这种设计不可避免的出现了bind variable graduation。另外由于一条insert需要耗费14ms,此sql已经成为了这个系统的top1 sql,并且在11.2.0.3中遇到了这个bug(so sad ~)。由此可见表设计的重要性。目前在国内,为了满足业务的需求,无休止的为表添加column依然成为习惯。不过当你的系统已经接近瓶颈时候,你就会发现当初的决定是多么的愚蠢。

Bug 14127231 dbms_shared_pool.purge raised ora-6570 on obsoleted child cursors
This note gives a brief overview of bug 14127231.
The content was last updated on: 06-JUL-2012
Click here for details of each of the sections below.
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 11.2 but BELOW 12.1
Versions confirmed as being affected => 11.2.0.3

Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in ? 12.1 (Future Release)
11.2.0.4 (Future Patch Set)

Symptoms: Related To:
Error May Occur
ora-6570 PL/SQL (DBMS Packages)
dbms_shared_pool

Description
dbms_shared_pool.purge() returns ora-6570 when trying to purge an obsolete object.

Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.
References
Bug:14127231 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article

————–

SQL> alter system set "_cursor_obsolete_threshold"=3 scope=spfile;

System altered.

SQL> 
SQL> startup force;



SQL> set linesize 300
SQL> select sql_text,sql_id,address,hash_value,is_obsolete  from v$sql where sql_text like '%INSERT INTO t (n, v) VALUES%';

SQL_TEXT                                                               SQL_ID        ADDRESS          HASH_VALUE I
---------------------------------------------------------------------- ------------- ---------------- ---------- -
INSERT INTO t (n, v) VALUES (:n, :v)                                   6cvmu7dwnvxwj 000000011ECD0220 2035152785 Y
INSERT INTO t (n, v) VALUES (:n, :v)                                   6cvmu7dwnvxwj 000000011ECD0220 2035152785 Y
INSERT INTO t (n, v) VALUES (:n, :v)                                   6cvmu7dwnvxwj 000000011ECD0220 2035152785 Y
INSERT INTO t (n, v) VALUES (:n, :v)                                   6cvmu7dwnvxwj 000000011ECCE110 2035152785 N

SQL> 
SQL> 

Sys:

SQL> exec dbms_shared_pool.purge('000000011ECD0220,2035152785','C');

PL/SQL procedure successfully completed.



SQL> 
SQL> select sql_text,sql_id,address,hash_value,is_obsolete  from v$sql where sql_text like '%INSERT INTO t (n, v) VALUES%';

SQL_TEXT                                                               SQL_ID        ADDRESS          HASH_VALUE I
---------------------------------------------------------------------- ------------- ---------------- ---------- -
INSERT INTO t (n, v) VALUES (:n, :v)                                   6cvmu7dwnvxwj 000000011ECCE110 2035152785 N

SQL> 
SQL> 
SQL> exit

更新:在Patch 14127231中 已经修复这一bug 这个CPU针对solaris 和 linux 11.2.0.3.x.最经也release了 AIX 64 bit的patch.

Oracle Direct NFS Performance

November 23, 2012 11g, Architect, oracle, software 1 comment

在Oracle 11g中引入了Direct Network File System(Oracle Direct NFS)的新特性,通过一个打包在Oracle内核中的NFS client来提高使用NFS时的性能,DNFS同样适用于RAC等高可用场景。

DIRECT NFS CLIENT OVERVIEW
Standard NFS client software, provided by the operating system, is not optimized for Oracle Database file I/O access patterns. With Oracle Database 11g, you can configure Oracle Database to access NFS V3 NAS devices directly using Oracle Direct NFS Client, rather than using the operating system kernel NFS client. Oracle Database will access files stored on the NFS server directly through the integrated Direct NFS Client eliminating the overhead imposed by the operating system kernel NFS. These files are also accessible via the operating system kernel NFS client thereby allowing seamless administration.

Benefits of Direct NFS Client
Direct NFS Client overcomes many of the challenges associated with using NFS with the Oracle Database. Direct NFS Client outperforms traditional NFS clients, is simple Oracle Database 11g – Direct NFS Client

最近公司准备使用dnfs 作为测试数据库的存储模式,正好借机测试了一把。

具体环境为:

[root@DCA-SRV-0133 u01]# dmidecode -s system-product-name
PowerEdge R905
[root@DCA-SRV-0133 u01]# cat /proc/meminfo |grep  MemTotal
MemTotal:       132352732 kB
[root@DCA-SRV-0133 u01]# cat /proc/cpuinfo |grep processor |wc -l
24
[root@DCA-SRV-0133 u01]# cat /etc/issue
Oracle  Server release 5.8
Kernel \r on an \m

存储使用NETAPP 标准NAS结构

测试对比dnfs,nfs 性能指标,使用数据库版本11.2.0.3 

setup by DNFS

1.mount nfs 文件

[root@DCA-SRV-0133 /]# showmount  -e nas
Export list for nas:
/vol/vol0        192.168.254.0/24
/vol/vol_user_db 192.168.254.0/24,10.0.254.0/24
[root@DCA-SRV-0133 /]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1             localhost.localdomain localhost
::1          localhost6.localdomain6 localhost6
10.0.254.1   nas
10.63.0.221   DCA-SRV-0133
[root@DCA-SRV-0133 /]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1            101572540   3671028  92658704   4% /
/dev/sda3            101572572    192352  96137408   1% /home
tmpfs                 66176364         0  66176364   0% /dev/shm
/dev/sda4            878850916  14454200 819753688   2% /data
10.0.254.1:/vol/vol_user_db
                     1518128384   1821888 1516306496   1% /u01
[root@DCA-SRV-0133 /]# umount /u01/
[root@DCA-SRV-0133 /]# mount -t nfs 10.0.254.1:/vol/vol_user_db  /u01
[root@DCA-SRV-0133 /]# df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1            101572540   3671056  92658676   4% /
/dev/sda3            101572572    192352  96137408   1% /home
tmpfs                 66176364         0  66176364   0% /dev/shm
/dev/sda4            878850916  14454200 819753688   2% /data
10.0.254.1:/vol/vol_user_db
                     1518128384   1821888 1516306496   1% /u01

2.配置数据库 dnfs

[oracle@DCA-SRV-0133 lib]$ cd $ORACLE_HOME/lib
[oracle@DCA-SRV-0133 lib]$ mv libodm11.so libodm11.so.old
[oracle@DCA-SRV-0133 lib]$ ln -s libnfsodm11.so libodm11.so
[oracle@DCA-SRV-0133 lib]$ ls -l libodm11.so
lrwxrwxrwx 1 oracle oinstall 14 Nov 22 16:03 libodm11.so -> libnfsodm11.so
[oracle@DCA-SRV-0133 lib]$ ls -l  libodm11.so
lrwxrwxrwx 1 oracle oinstall 14 Nov 22 16:03 libodm11.so -> libnfsodm11.so
[oracle@DCA-SRV-0133 lib]$ ls -l  libodm11.so.old
-rwxr-xr-x 1 oracle oinstall 60431 Nov 22 15:08 libodm11.so.old


[oracle@DCA-SRV-0133 lib]$ cat $ORACLE_HOME/dbs/oranfstab
server: nas
local:10.0.254.21 ----->这里注意由于使用了网卡bind 使用了两个IP区分nfs 网络与IDC网络,需要手工指定local ip.
path: 10.0.254.1
path: 192.168.254.1  ------------------- > MP
export: /vol/vol_user_db     mount: /u01
[oracle@DCA-SRV-0133 lib]$

3.startup database

SQL> startup
ORACLE instance started.

Total System Global Area 4.0486E+10 bytes
Fixed Size                2237088 bytes
Variable Size              3892317536 bytes
Database Buffers       3.6507E+10 bytes
Redo Buffers                 84631552 bytes

.....




Starting ORACLE instance (normal)
****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 128 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
..


Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0
Fri Nov 23 13:34:34 2012
PMON started with pid=2, OS id=18612
Fri Nov 23 13:34:34 2012
PSP0 started with pid=3, OS id=18616
Fri Nov 23 13:34:35 2012
VKTM started with pid=4, OS id=18620 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms

..
ORACLE_BASE from environment = /data/oracle
Fri Nov 23 13:34:36 2012
ALTER DATABASE   MOUNT
Direct NFS: channel id [0] path [10.0.254.1] to filer [nas] via local [10.0.254.21] is UP
Direct NFS: channel id [1] path [10.0.254.1] to filer [nas] via local [10.0.254.21] is UP
Successful mount of redo thread 1, with mount id 2930804780
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT

4. 使用swingbench 导入10g数据

4.1设置database sga 为5G .


DNFS performance 100 users:

AVG TPS 2222 AVG response time delay 40ms

NFS performance 100 users:

AVG TPS 1401 AVG response time delay 80ms

4.2 设置database sga 为 800M

DNFS performance 1000 users:

AVG TPS 578 AVG response time delay 1.5S

NFS performance 1000 users:

AVG TPS 400 AVG response time delay 2.0S

Other DNFS Best practice Performance:

Incremental Statistics for Partitioned Tables in 11g

November 8, 2012 11g, oracle No comments

这篇文档是我们前leader wwf 写的分析文档,指出了incremental statistic针对partition table 的一些用法,并且指出了一些bug,看不了slideshare的TX可以直接下载Inc_Stat_For_PT.doc

针对这几个bug问题在11gR2下的表现 对这篇文档做了一个补充

第一个问题:It is only sensitive for newly load data partition

SQL> create table test_inc(a number, b date, c varchar2(30), d varchar2(100), e varchar2(100), partition_key number)
  2   partition by range(partition_key)
 (
    partition p00 values less than (1),
    partition p01 values less than (2),
    partition p02 values less than (3),
    partition p03 values less than (4),
    partition p04 values less than (5),
    partition p05 values less than (6),
    partition p06 values less than (7),
    partition p07 values less than (8),
    partition p08 values less than (9),
    partition p09 values less than (10),
    partition p10 values less than (11),
    partition p11 values less than (12),
    partition p12 values less than (13),
    partition p13 values less than (14),
    partition p14 values less than (15),
  3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19      partition p15 values less than (16),
    partition p16 values less than (17),
    partition p17 values less than (18),
    partition p18 values less than (19),
    partition p19 values less than (20),
    partition p20 values less than (21),
    partition p21 values less than (22),
    partition p22 values less than (23),
    partition p23 values less than (24),
    partition p24 values less than (25),
    partition p25 values less than (26),
    partition p26 values less than (27),
    partition p27 values less than (28),
    partition p28 values less than (29),
    partition p29 values less than (30),
    partition p30 values less than (31),
 20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35      partition pmax values less than(maxvalue)
 ) ;
 36  
Table created.

SQL> 
SQL> create table test_inc_bak(a number, b date, c varchar2(30), d varchar2(100), e varchar2(100), partition_key number)
  2     partition by range(partition_key)
   (
      partition p00 values less than (1),
      partition p01 values less than (2),
      partition p02 values less than (3),
      partition p03 values less than (4),
      partition p04 values less than (5),
      partition p05 values less than (6),
      partition p06 values less than (7),
      partition p07 values less than (8),
      partition p08 values less than (9),
      partition p09 values less than (10),
      partition p10 values less than (11),
      partition p11 values less than (12),
      partition p12 values less than (13),
      partition p13 values less than (14),
      partition p14 values less than (15),
  3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19        partition p15 values less than (16),
      partition p16 values less than (17),
      partition p17 values less than (18),
      partition p18 values less than (19),
      partition p19 values less than (20),
      partition p20 values less than (21),
      partition p21 values less than (22),
      partition p22 values less than (23),
      partition p23 values less than (24),
      partition p24 values less than (25),
      partition p25 values less than (26),
      partition p26 values less than (27),
      partition p27 values less than (28),
      partition p28 values less than (29),
      partition p29 values less than (30),
      partition p30 values less than (31),
 20   21   22   23   24   25   26   27   28        partition pmax values less than(maxvalue)
   );
 29   30   31   32   33   34   35   36  
Table created.

SQL>  select DBMS_STATS.get_PREFS('incremental', 'liu', 'test_inc') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','LIU','TEST_INC')
--------------------------------------------------------------------------------
FALSE

SQL>  exec DBMS_STATS.SET_TABLE_PREFS(user,'test_inc','INCREMENTAL','TRUE');

PL/SQL procedure successfully completed.

SQL>  select DBMS_STATS.get_PREFS('incremental', 'liu', 'test_inc') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','LIU','TEST_INC')
--------------------------------------------------------------------------------
TRUE

SQL>  select DBMS_STATS.get_PREFS('incremental', 'liu', 'test_inc_bak') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','LIU','TEST_INC_BAK')
--------------------------------------------------------------------------------
FALSE

SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';

Session altered.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 0 from dual connect by rownum <= 300000;   2  

300000 rows created.

SQL> commit;

Commit complete.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 1 from dual connect by rownum <= 300000;  2  

300000 rows created.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 2 from dual connect by rownum <= 300000;    2  

300000 rows created.

SQL> commit;

Commit complete.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 3 from dual connect by rownum <= 300000;  2  

300000 rows created.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 4 from dual connect by rownum <= 300000;  2  

300000 rows created.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 5 from dual connect by rownum <= 300000;    2  

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'test_inc');

PL/SQL procedure successfully completed.

SQL>  select partition_name, blocks, num_rows, last_analyzed from user_tab_partitions where table_name = 'TEST_INC' and num_rows > 0;

PARTITION_NAME			   BLOCKS   NUM_ROWS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
P00				     6928     300000 2012-11-07 23:17:42
P01				     6928     300000 2012-11-07 23:17:42
P02				     6802     300000 2012-11-07 23:17:43
P03				     7054     300000 2012-11-07 23:21:51
P04				     7054     300000 2012-11-07 23:21:50
P05				     6928     300000 2012-11-07 23:21:50

6 rows selected.

SQL> delete from test_inc partition(p05) where rownum <= 100000;

100000 rows deleted.

SQL> alter table test_inc truncate partition p04 ;

Table truncated.

SQL> insert into test_inc select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
  rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 1 from dual connect by rownum <= 300000;  2  

300000 rows created.

SQL> exec dbms_stats.gather_table_stats(user, 'test_inc');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows, blocks, last_analyzed from user_tab_partitions where table_name = 'TEST_INC' and num_rows > 0;

PARTITION_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
P00				   300000	6928 2012-11-07 23:17:42
P01				   600000      14056 2012-11-07 23:22:59
P02				   300000	6802 2012-11-07 23:17:43
P03				   300000	7054 2012-11-07 23:21:51
P05				   200000	6928 2012-11-07 23:22:58

准确的判断了数据,插入同样数据到test_inc_bak;

SQL> insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
  2  rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 0 from dual connect by rownum <= 300000;  


insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 1 from dual connect by rownum <= 300000;  


insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 2 from dual connect by rownum <= 300000;   



insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 3 from dual connect by rownum <= 300000;   



insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 4 from dual connect by rownum <= 300000;   


insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 5 from dual connect by rownum <= 300000;     

300000 rows created.

SQL> SQL> SQL>   2  

300000 rows created.

SQL> SQL> SQL>   2  

300000 rows created.

SQL> SQL> SQL> SQL>   2  

300000 rows created.

SQL> SQL> SQL> SQL>   2  

300000 rows created.

SQL> SQL> SQL>   2  

300000 rows created.

SQL> SQL> SQL> SQL> SQL> SQL> 
SQL> 
SQL> 
SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'test_inc_bak');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows, blocks, last_analyzed from user_tab_partitions where table_name = 'TEST_INC_BAK'and num_rows > 0;

PARTITION_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
P00				   300000	7054 2012-11-07 23:44:01
P01				   300000	7054 2012-11-07 23:44:01
P02				   300000	6928 2012-11-07 23:44:01
P03				   300000	7054 2012-11-07 23:44:02
P04				   300000	7054 2012-11-07 23:44:02
P05				   300000	7054 2012-11-07 23:44:02

6 rows selected.

SQL>  delete from test_inc_bak partition(p05) where rownum <= 100000;

100000 rows deleted.

SQL> alter table test_inc_bak truncate partition p04 ;

Table truncated.

SQL> 
SQL> insert into test_inc_bak select rownum, sysdate+mod(rownum, 30), 'wangweifengwangweifeng'||rownum, 
  rpad('x', 50, 'x')||rownum, rpad('y', 50, 'z')||rownum, 1 from dual connect by rownum <= 300000;  2  

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'test_inc_bak');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows, blocks, last_analyzed from user_tab_partitions where table_name = 'TEST_INC_BAK'and num_rows > 0;

PARTITION_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
P00				   300000	7054 2012-11-07 23:45:20
P01				   600000      14194 2012-11-07 23:45:21
P02				   300000	6928 2012-11-07 23:45:21
P03				   300000	7054 2012-11-07 23:45:22
P05				   200000	7054 2012-11-07 23:45:22

可以看到所有的分区重新被分析了一次

问题二:How about I truncate all tables? in wwf’s situation,even he truncated all table but still got no analyzed.

SQL> truncate table test_inc; 

Table truncated.

SQL> exec dbms_stats.gather_table_stats(user, 'test_inc');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows, blocks, last_analyzed from user_tab_partitions where table_name = 'TEST_INC' and num_rows > 0;

no rows selected

OK In 11gR2 这个问题已经被解决。

问题三:For subpartition table, it is totally wrong We got different result querying user_tab_subpartitions and user_tab_partitions

SQL> CREATE TABLE composite_rng_list (
  2     cust_id     NUMBER(10),
   cust_name   VARCHAR2(25),
   cust_state  VARCHAR2(2),
   time_id     DATE)
   PARTITION BY RANGE(time_id)
   SUBPARTITION BY LIST (cust_state)
   SUBPARTITION TEMPLATE(
   SUBPARTITION west VALUES ('OR', 'WA'),
   SUBPARTITION east VALUES ('NY', 'CT'),
   SUBPARTITION cent VALUES ('OK', 'TX')) (
   PARTITION per1 VALUES LESS THAN (TO_DATE('01/01/2008','DD/MM/YYYY')),
   PARTITION per2 VALUES LESS THAN (TO_DATE('01/01/2010','DD/MM/YYYY')),
   PARTITION per3 VALUES LESS THAN (TO_DATE('01/01/2012','DD/MM/YYYY')),
   PARTITION future VALUES LESS THAN(MAXVALUE));  3    4    5    6    7    8    9   10   11   12   13   14   15  

Table created.



SQL>  exec dbms_stats.set_table_prefs('liu', 'composite_rng_list', 'incremental', 'true')  ;

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('incremental', 'liu', 'composite_rng_list') from dual;

DBMS_STATS.GET_PREFS('INCREMENTAL','LIU','COMPOSITE_RNG_LIST')
--------------------------------------------------------------------------------
TRUE

SQL>  insert into composite_rng_list select rownum, 'customer'||rownum, 'OR', to_date('2007-01-01', 'yyyy-mm-dd') from dual connect by rownum <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name = 'COMPOSITE_RNG_LIST';

TABLE_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
COMPOSITE_RNG_LIST

SQL> exec dbms_stats.gather_table_stats(user, 'COMPOSITE_RNG_LIST', granularity=>'ALL')

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name = 'COMPOSITE_RNG_LIST';

TABLE_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
COMPOSITE_RNG_LIST		   100000	 494 2012-11-08 00:16:50

SQL> insert into composite_rng_list select rownum, 'customer'||rownum, 'WA', to_date('2007-01-01', 'yyyy-mm-dd') from dual connect by rownum <= 100000;

100000 rows created.

SQL> exec dbms_stats.gather_table_stats(user, 'COMPOSITE_RNG_LIST', granularity=>'ALL')

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name = 'COMPOSITE_RNG_LIST';

TABLE_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
COMPOSITE_RNG_LIST		   200000	1006 2012-11-08 00:17:24


SQL> select partition_name, SUBPARTITION_NAME, num_rows, blocks, last_analyzed from user_tab_subpartitions where table_name = 'COMPOSITE_RNG_LIST' and num_rows > 0;

PARTITION_NAME		       SUBPARTITION_NAME		NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ------------------------------ ---------- ---------- -------------------
PER1			       PER1_WEST			  200000       1006 2012-11-08 00:17:23

SQL>  select partition_name, num_rows, blocks, last_analyzed from user_tab_partitions where table_name = 'COMPOSITE_RNG_LIST' and num_rows > 0;

PARTITION_NAME			 NUM_ROWS     BLOCKS LAST_ANALYZED
------------------------------ ---------- ---------- -------------------
PER1				   200000	1006 2012-11-08 00:17:24

可以看到在11gR2中所有的问题均已解决

11gR2 rac install ASM terminating the instance due to error 481

October 16, 2012 11g, oracle, RAC No comments

一套新上线的RAC系统(redhat 5.8 RAC 11.2.0.3) 安装过程中跑GI root.sh时抛出error 481错误:

CRS-2672: Attempting to start 'ora.asm' on 'ptdb1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/g01/app/oracle/product/11.2.0/grid/log/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'racnode1' failed
..
Failed to start ASM at /software/app/11.2.0.3/crs/install/crsconfig_lib.pm line 1272

这一问题的解决可以参考如下文档:


ASM on Non First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn): terminating the instance due to error 481

In this Document
Purpose
Details
Case1: link local IP (169.254.x.x) is being used by other adapter/network
Case2: firewall exists between nodes on private network (iptables etc)
Case3: HAIP is up on some nodes but not on all
Case4: HAIP is up on all nodes but some do not have route info
References
Applies to:

Oracle Server – Enterprise Edition – Version 11.2.0.1 and later
Information in this document applies to any platform.
Purpose

This note lists common causes of ASM start up failure with the following error on non-first node (second or others):

alert_.log from non-first node
lmon registered with NM – instance number 2 (internal mem no 1)
Tue Dec 06 06:16:15 2011
System state dump requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /g01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_19138.trc
Tue Dec 06 06:16:15 2011
PMON (ospid: 19095): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20111206061615], requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
Tue Dec 06 06:16:15 2011
ORA-1092 : opitsk aborting process

Note: ASM instance terminates shortly after “lmon registered with NM”
If ASM on non-first node was running previously, likely the following will be in alert.log when it failed originally:
..
IPC Send timeout detected. Sender: ospid 32231 [oracle@ftdcslsedw01b (PING)]
..
ORA-29740: evicted by instance number 1, group incarnation 10
..

diag trace from non-first ASM (+ASMn_diag_.trc)
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE])

alert_.log from first node
LMON (ospid: 15986) detects hung instances during IMR reconfiguration
LMON (ospid: 15986) tries to kill the instance 2 in 37 seconds.
Please check instance 2’s alert log and LMON trace file for more details.
..
Remote instance kill is issued with system inc 64
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Reconfiguration started (old inc 64, new inc 66)

If the issue happens while running root script (root.sh or rootupgrade.sh) as part of Grid Infrastructure installation/upgrade process, the following symptoms will present:

root script screen output
Start of resource “ora.asm” failed

CRS-2672: Attempting to start ‘ora.asm’ on ‘racnode1’
CRS-5017: The resource action “ora.asm start” encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to “(:CLSN00107:)” in “/ocw/grid/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log”.
CRS-2674: Start of ‘ora.asm’ on ‘racnode1’ failed
..
Failed to start ASM at /ispiris-qa/app/11.2.0.3/crs/install/crsconfig_lib.pm line 1272
$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_.log
2011-11-29 15:56:48: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl start resource ora.asm -init
..
> CRS-2672: Attempting to start ‘ora.asm’ on ‘racnode1’
> CRS-5017: The resource action “ora.asm start” encountered the following error:
> ORA-03113: end-of-file on communication channel
> Process ID: 0
> Session ID: 0 Serial number: 0
> . For details refer to “(:CLSN00107:)” in “/ispiris-qa/app/11.2.0.3/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log”.
> CRS-2674: Start of ‘ora.asm’ on ‘racnode1’ failed
> CRS-2679: Attempting to clean ‘ora.asm’ on ‘racnode1’
> CRS-2681: Clean of ‘ora.asm’ on ‘racnode1’ succeeded
..
> CRS-4000: Command Start failed, or completed with errors.
>End Command output
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl check resource ora.asm -init
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl status resource ora.asm -init
2011-11-29 15:59:01: Checking the status of ora.asm
..
2011-11-29 15:59:53: Start of resource “ora.asm” failed
Details

Case1: link local IP (169.254.x.x) is being used by other adapter/network

Symptoms:

$GRID_HOME/log//alert.log
[/ocw/grid/bin/orarootagent.bin(4813)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0
OS messages (optional)
Dec 6 06:11:14 racnode1 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67
Dec 6 06:11:14 racnode1 dhclient: DHCPACK from 169.254.95.118
ifconfig -a
..
usb0 Link encap:Ethernet HWaddr E6:1F:13:AD:EE:D3
inet addr:169.254.95.120 Bcast:169.254.95.255 Mask:255.255.255.0
..

Note: it’s usb0 in this case, but it can be any other adapter which uses link local

Solution:

Link local IP must not be used by any other network on cluster nodes. In this case, an USB network device gets IP 169.254.95.118 from DHCP server which disrupted HAIP routing, and solution is to black list the device in udev from being activated automatically.
Case2: firewall exists between nodes on private network (iptables etc)

No firewall is allowed on private network (cluster_interconnect) between nodes including software firewall like iptables, ipmon etc
Case3: HAIP is up on some nodes but not on all

Symptoms:

alert_<+ASMn>.log for some instances
Cluster communication is configured to use the following interface(s) for this instance
10.1.0.1
alert_<+ASMn>.log for other instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.201.65

Note: some instances is using HAIP while others are not, so they can not talk to each other
Solution:

The solution is to bring up HAIP on all nodes.

To find out HAIP status, execute the following on all nodes:

$GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init

If it’s offline, try to bring it up as root:

$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip -init

If HAIP fails to start, refer to note 1210883.1 for known issues.

If the “up node” is not using HAIP, and no outage is allowed, the workaround is to set init.ora/spfile parameter cluster_interconnect to the private IP of each node to allow ASM/DB to come up on “down node”. Once a maintenance window is planned, the parameter must be removed to allow HAIP to work.

Case4: HAIP is up on all nodes but some do not have route info

Symptoms:

alert_<+ASMn>.log for all instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.xxx.xxx
“netstat -rn” for some nodes (surviving nodes) missing HAIP route
netstat -rn
Destination Gateway Genmask Flags MSS Window irtt Iface
161.130.90.0 0.0.0.0 255.255.248.0 U 0 0 0 bond0
160.131.11.0 0.0.0.0 255.255.255.0 U 0 0 0 bond2
0.0.0.0 160.11.80.1 0.0.0.0 UG 0 0 0 bond0

The line for HAIP is missing, i.e:

169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2

Note: As HAIP route info is missing on some nodes, HAIP is not pingable; usually newly restarted node will have HAIP route info
Solution:

The solution is to manually add HAIP route info on the nodes that’s missing:

4.1. Execute “netstat -rn” on any node that has HAIP route info and locate the following:

169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2

Note: the first field is HAIP subnet ID and will start with 169.254.xxx.xxx, the third field is HAIP subnet netmask and the last field is private network adapter name

4.2. Execute the following as root on the node that’s missing HAIP route:

# route add -net netmask dev

i.e.

# route add -net 169.254.0.0 netmask 255.255.0.0 dev bond2

4.3. Start ora.crsd as root on the node that’s partial up:.

# $GRID_HOME/bin/crsctl start res ora.crsd -init

The other workaround is to restart GI on the node that’s missing HAIP route with “crsctl stop crs -f” and “crsctl start crs” command as root.

此时抛出的错误确实是usb0导致,kvm的控制台使用dbcp动态分配ip使得两台app服务器分配了169.254网段的ip.禁止掉dhcp之后手动分配ip继续抛出如下错误:

[cssd(22096)]CRS-1605:CSSD voting file is online: /dev/mapper/mpath3; details in /g01/app/oracle/product/11.2.0/grid/log/ptdb2/cssd/ocssd.log.
[cssd(22096)]CRS-1636:The CSS daemon was started in exclusive mode but found an active CSS daemon on node ptdb1 and is terminating; details at (:CSSNM00006:) in /g01/app/oracle/product/11.2.0/grid/log/ptdb2/cssd/ocssd.log
2012-10-15 10:07:33.421
[ohasd(20942)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ptdb2'.

2012-10-15 10:11:27.669
[cssd(22425)]CRS-1662:Member kill requested by node ptdb1 for member number 1, group DB+ASM
2012-10-15 10:11:29.931
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.079
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.119
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:31.197
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".
2012-10-15 10:11:32.837
[/g01/app/oracle/product/11.2.0/grid/bin/orarootagent.bin(22496)]CRS-5016:Process "/g01/app/oracle/product/11.2.0/grid/bin/acfsload" spawned by agent "/g01/app/oracle/product/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/orarootagent_root/orarootagent_root.log"
2012-10-15 10:11:33.068
[/g01/app/oracle/product/11.2.0/grid/bin/oraagent.bin(22334)]CRS-5019:All OCR locations are on ASM disk groups [OCRVOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/g01/app/oracle/product/11.2.0/grid/log/ptdb2/agent/ohasd/oraagent_grid/oraagent_grid.log".

查看 orarootagent_root.log

2012-10-15 10:08:40.295: [ USRTHRD][1115584832] {0:0:209} HAIP:  Updating member info HAIP1;172.168.0.0#0
2012-10-15 10:08:40.298: [ USRTHRD][1115584832] {0:0:209} InitializeHaIps[ 0]  infList 'inf eth1, ip 172.168.0.102, sub 172.168.0.0'
2012-10-15 10:08:40.299: [ USRTHRD][1115584832] {0:0:209} Error in getting Key SYSTEM.network.haip.group.cluster_interconnect.interface.valid in OCR
2012-10-15 10:08:40.303: [ CLSINET][1115584832] failed to open OLR HAIP subtype SYSTEM.network.haip.group.cluster_interconnect.interface.valid key, rc=4

问题就很好定位了,虽然所有node的res haip的状态为online 但是手动去ping对方 无法ping通 节点2无法通过haip获得必要资源

两个原因导致这个问题:

1. 网卡的driver有问题,导致在private 网卡上虚拟出来的haip 无法通信

2. 交换机的配置有问题有可能vlan的划分 或者交换机的bug导致这一现象

所以对上述文档的case ,我们可以增加一种:

CASE 5

如果HAIP 均online 并且 netstat -rn 均存在相关entry,需要保证node之间的haip能够ping通。在此例中 如果手工”restart res haip”则问题可以得到暂时解决.但是如果reboot机器 问题依旧,最后采取交叉线直连两个节点解决问题,如果节点数大于2 需要从交换机着手此问题。

对于11.2 CRS troubleshooting ,主要的日志有下面这些:
<GRID_HOME>/log/<host>/alert<node>.log :这是CRS 的 alert log,CRS 的启动/停止,核心进程(crsd, ocssd etc.)的启动/停止,对于一些资源的检查异常都会记录下来。

<GRID_HOME>/log/<host>/crsd : 这是CRSD的日志,记录了CRSD 监控的资源的启动、停止以及异常,比如数据库实例、ASM、监听、磁盘组、VIP等。
<GRID_HOME>/log/<host>/cssd: 这是OCSSD进程的日志,记录了节点间的通讯情况,如果有网络心跳或者Voting disk的异常,或者OCSSD进程异常终止,都会记录下来。
<GRID_HOME>/log/<host>/evmd:这是事件监控(event monitor)进程的日志。
<GRID_HOME>/log/<host>/ohasd:这是OHASD进程的日志,有的时候CRS无法启动,可能需要查看这个日志。
<GRID_HOME>/log/<host>/mdnsd:DNS相关的日志
<GRID_HOME>/log/<host>/gipcd:进程间或者节点间通讯的日志
<GRID_HOME>/log/<host>/gpnpd: 关于GPNP(Grid Plug & Play Profile )的日志
<GRID_HOME>/log/<host>/gnsd(可选):Grid Naming Service 的日志
<GRID_HOME>/log/<host>/ctssd: 时间同步服务的日志。

<GRID_HOME>/log/<host>/agent/ohasd:
<GRID_HOME>/log/<host>/agent/ohasd/oraagent_grid: 启动/停止/检查/清除ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd等资源。
<GRID_HOME>/log/<host>/agent/ohasd/orarootagent_root:启动/停止 /检查/清除 ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs, ora.crf (11.2.0.2)等资源。
<GRID_HOME>/log/<host>/agent/ohasd/oracssdagent_root: 启动/停止/检查 ocssd进程。
<GRID_HOME>/log/<host>/agent/ohasd/oracssdmonitor_root:监控cssdagent进程。

<GRID_HOME>/log/<host>/agent/crsd:
<GRID_HOME>/log/<host>/agent/crsd/oraagent_grid: 启动/停止/检查/清除 asm, ora.eons, ora.LISTENER.lsnr, SCAN listeners, ora.ons, diskgroup  等资源
<GRID_HOME>/log/<host>/agent/crsd/oraagent_oracle: 启动/停止/检查/清除 service, database 等资源
<GRID_HOME>/log/<host>/agent/crsd/orarootagent_root : 启动/停止/检查/清除 GNS, VIP, SCAN VIP and network 等资源.
<GRID_HOME>/log/<host>/agent/crsd/scriptagent_grid:  定制的应用服务的日志。

11g Smart Flash Cache

August 30, 2012 11g, oracle, RAC No comments

整理一篇关于11g smart flash cache 的文章,vmcd

我们来做一下演示 采用FusionIO SLC 160G

[oracle@db-41 ~]$ ls -l /dev/fioa 
brw-r----- 1 oracle oinstall 252, 0 Jul 17 14:26 /dev/fioa


[root@db-41 dev]# fio-status

Found 1 ioDrive in this system
Fusion-io driver version: 2.3.1 build 123

Adapter: ioDrive
	Fusion-io ioDrive 160GB, Product Number:VRG5T SN:479912
	External Power: NOT connected
	PCIE Power limit threshold: 24.75W
	Sufficient power available: Unknown
	Connected ioDimm module:
	  fct0:	Fusion-io ioDrive 160GB, Product Number:VRG5T SN:479912

fct0	Attached as 'fioa' (block device)
	Fusion-io ioDrive 160GB, Product Number:VRG5T SN:479912
	Located in slot 0 Upper of ioDrive SN:479912
	PCI:08:00.0, Slot Number:1
	Firmware v5.0.6, rev 101583
	160.00 GBytes block device size, 203 GBytes physical device size
	Sufficient power available: Unknown
	Internal temperature: 43.3 degC, max 44.8 degC
	Media status: Healthy; Reserves: 100.00%, warn at 10.00%
	


SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 30 11:13:32 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> show parameter sga;

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
lock_sga			     boolean	 FALSE
pre_page_sga			     boolean	 FALSE
sga_max_size			     big integer 68096M
sga_target			     big integer 68096M

SQL> alter system set db_flash_cache_size=150G scope=spfile;

System altered.

SQL> alter system set db_flash_cache_file='/dev/fioa' scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 7.1085E+10 bytes
Fixed Size		    2235608 bytes
Variable Size		 8053064488 bytes
Database Buffers	 6.2814E+10 bytes
Redo Buffers		  215855104 bytes
Database mounted.
ORA-24324: service handle not initialized
ORA-24323: value not allowed
ORA-03113: end-of-file on communication channel


SQL> startup
ORACLE instance started.

Total System Global Area 7.1085E+10 bytes
Fixed Size		    2235608 bytes
Variable Size		 8053064488 bytes
Database Buffers	 6.2814E+10 bytes
Redo Buffers		  215855104 bytes
Database mounted.
Database opened.
SQL> show parameter flash_cache

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_flash_cache_file		     string	 /dev/fioa
db_flash_cache_size		     big integer 149G
SQL> alter system set  db_flash_cache_size=150G scope=spfilel
  2  

SQL> alter system set  db_flash_cache_size=150G scope=spfile;

System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 7.1085E+10 bytes
Fixed Size		    2235608 bytes
Variable Size		 8053064488 bytes
Database Buffers	 6.2814E+10 bytes
Redo Buffers		  215855104 bytes
Database mounted.
ORA-24324: service handle not initialized
ORA-24323: value not allowed
ORA-03113: end-of-file on communication channel


ORA-27042: not enough space on raw partition to fullfill request
Additional information: 3


modify db_flash_cache_size to 149G


SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 30 11:04:10 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> create spfile from pfile;

File created.

SQL> startup
ORACLE instance started.

Total System Global Area 7.1085E+10 bytes
Fixed Size		    2235608 bytes
Variable Size		 8053064488 bytes
Database Buffers	 6.2814E+10 bytes
Redo Buffers		  215855104 bytes
Database mounted.
Database opened.
SQL> show parameter flash_cache

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_flash_cache_file		     string	 /dev/fioa
db_flash_cache_size		     big integer 149G

SQL> select 149/68 from dual;

    149/68
----------
2.19117647

11.2.0.2 EXPDP 遭遇 ORA-4031

July 16, 2012 11g, maintain, oracle No comments

一套oracle 11.2.0.2 数据仓库系统使用expdp导出数据时 进程hang 死 使用kill -9命令杀掉之后 database一直没有释放latch 通过下面的日志可以发现 pmon已经发现 ospid=15038的process dead 但是无法clean 进程在为 “fixed allocation callback” 分配内存时 遭遇了ORA-4031,在这套库中stream_pool_size 并没有显式指定,而从11g开始oracle data pump开始使用Advanced Queue高级队列来控制其job作业,这也就是这个case的root cause。

我们可以通过显式指定stream_pool_size解决这个问题,注意指定stream_pool_size之后必须restart instance.

我们可以通过3个note完整的理解这个问题

DataPump Export (EXPDP) Fails With Errors UDE-31623 ORA-4031 [ID 1318528.1]
UDE-31623 Error With DataPump Export [ID 1080775.1]
Bug 9896536: MEMORY LEAKE OCCURS IN STREAM SPOOL WHEN EXPDP IS EXECUTED.

*** 2012-07-16 13:34:14.231
found process 0x1bd12952b0 pid=403 serial=16 ospid = 15038 dead

*** 2012-07-16 13:34:19.445
kssxdl: error deleting SO: 0x1bd98ba410 = transaction (55)
  ----------------------------------------
  SO: 0x1bd98ba410, type: 55, owner: 0x1b015a1100, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x1bd12952b0, name=transaction, file=ktccts.h LINE:407, pg=0
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")

*** 2012-07-16 13:34:29.456
found process 0x1bd12952b0 pid=403 serial=16 ospid = 15038 dead

*** 2012-07-16 13:34:34.713
kssxdl: error deleting SO: 0x1bd98ba410 = transaction (55)
  ----------------------------------------
  SO: 0x1bd98ba410, type: 55, owner: 0x1b015a1100, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x1bd12952b0, name=transaction, file=ktccts.h LINE:407, pg=0
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")

*** 2012-07-16 13:34:44.732
found process 0x1bd12952b0 pid=403 serial=16 ospid = 15038 dead
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")
Mon Jul 16 13:49:03 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/std/edw1/trace/edw1_pmon_14903.trc:
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")
Mon Jul 16 13:49:18 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/std/edw1/trace/edw1_pmon_14903.trc:
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")
Mon Jul 16 13:49:34 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /data/oracle/diag/rdbms/std/edw1/trace/edw1_pmon_14903.trc:
ORA-04031: unable to allocate 56 bytes of shared memory ("streams pool","unknown object","streams pool","fixed allocation callback")

11g DEFAULT_MAINTENANCE_PLAN 总结

June 18, 2012 11g, oracle No comments

查看数据仓库的awr报告发现以下问题 顺便总结一下之前碰到的一类问题

CPU Time (s) Executions CPU per Exec (s) %Total Elapsed Time (s) %CPU %IO SQL Id SQL Module SQL Text
13,664.95 12 1,138.75 8.99 15,656.35 87.28 11.83 3a1wvmtvsdvqv PL/SQL Developer call pkg_rpt_union.pro_rpt_uni…
10,474.67 5 2,094.93 6.89 14,773.14 70.90 28.90 6mcpb06rctk0x DBMS_SCHEDULER call dbms_space.auto_space_adv…
9,215.02 5 1,843.00 6.06 17,031.19 54.11 45.65 b6usrg82hwsa3 DBMS_SCHEDULER call dbms_stats.gather_databas…
8,881.02 2,737 3.24 5.84 11,250.82 78.94 21.14 9q7k9nbpvk8pv DBMS_SCHEDULER SELECT NVL(SUM(TIME_WAITED/100…
8,555.57 96 89.12 5.63 22,080.53 38.75 4.02 7ctdtkwpbrj4k sqlplus@racdb02 (TNS V1-V3) BEGIN edw1_user.pkg_rpt_today….
7,474.88 65,717 0.11 4.92 7,538.15 99.16 0.08 6m4s9pjrydbxt oracle@tracker-db2 (TNS V1-V3)

dbms_stats.gather_databas dbms_space.auto_space_advisor 耗费了系统大量的IO 并且耗费了大量的CPU 这可能引起cpu quantum 等待 相信不少TX遇到过 resmgr:cpu quantum等待 而时间点恰恰是22:00整,在11g 中oracle对automatic maintenance做出了调整 :


The Wait Event: resmgr: cpu quantum

DBRM allocates CPU resources by maintaining an execution queue similar to the way the operating
system’s scheduler queues processes for their turn on the CPU. The time a session spends waiting in this
execution queue is assigned the wait event resmgr: cpu quantum. A CPU quantum is the unit of CPU
time (fraction of CPU) that Resource Manager uses for allocating CPU to consumer groups. This event
occurs when Resource Manager is enabled and is actively throttling CPU consumption. Increasing the
CPU allocation for a session’s consumer group will reduce the occurrence of this wait event and increase
the amount of CPU time allocated to all sessions in that group. For example, the CPU quantum wait
events may be reduced for the APPS resource group (currently 70% at level 1) by increasing the group’s
CPU allocation to 80%.

In Oracle 11 the automatic maintenance jobs (Space Advisor, Gather Stats, Tuning Advisor) are by default run using the Resource Manager. The default settings give these tasks up to 25% of the CPU during their scheduler windows.
If you prefer these tasks to always be taking a “back seat” compared to any other jobs running on the system it may be worth slightly adjusting the resource allocation of the DEFAULT_MAINTENANCE_PLAN.

In the DEFAULT_MAINTENANCE_PLAN (the default version) we have:
Level 1: SYS_GROUP: 100%
Level 2: ORA$AUTOTASK_SUB_PLAN 25% / ORA$DIAGNOSTICS: 5% / OTHER_GROUPS 70%
Level 3+: none

This means that OTHER_GROUPS may only get 70% of the CPU (unless the other plans are not using up their quota, in which case some of the spare quota will be available to OTHER_GROUPS). To ensure that the auto maintenance tasks only get resources when OTHER_GROUPS don’t need them you can simply change the plan directive for the DEFAULT_MAINTENANCE_PLAN

并且我们可以看到一个有趣的现象 在低版本升级到11g的库中 WEEKNIGHT_WINDOW,WEEKEND_WINDOW 也是作为了 DEFAULT_MAINTENANCE_PLAN的子windows,在Oracle 11g中,有7个预定义维护时间窗,5个针对每个工作日的,两个针对周末的(周六和周日),七个预定义时间窗是按一周的每天命名的,所以即便默认开启了WEEKNIGHT_WINDOW,WEEKEND_WINDOW oracle依然会按照七个预定义时间窗一次执行
我们可以从一套10g升级到11g的库中得到证实:

9i/10g->11g

SQL> select window_name,resource_plan,comments from dba_scheduler_windows ;

WINDOW_NAME RESOURCE_PLAN COMMENTS
—————————— —————————— ————————————————————
WEEKNIGHT_WINDOW DEFAULT_MAINTENANCE_PLAN Weeknight window for maintenance task
WEEKEND_WINDOW DEFAULT_MAINTENANCE_PLAN Weekend window for maintenance task
MONDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Monday window for maintenance tasks
TUESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Tuesday window for maintenance tasks
WEDNESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Wednesday window for maintenance tasks
THURSDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Thursday window for maintenance tasks
FRIDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Friday window for maintenance tasks
SATURDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Saturday window for maintenance tasks
SUNDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Sunday window for maintenance tasks

SQL> SELECT LOG_ID, TO_CHAR(LOG_DATE, ‘MM/DD/YYYY’), WINDOW_NAME, OPERATION
FROM DBA_SCHEDULER_WINDOW_LOG order by 2;

LOG_ID TO_CHAR(LO WINDOW_NAME OPERATION
———- ———- ———————————————————— ——————————
19763 06/05/2012 TUESDAY_WINDOW OPEN
19756 06/05/2012 MONDAY_WINDOW CLOSE
19798 06/06/2012 WEDNESDAY_WINDOW OPEN
19784 06/06/2012 TUESDAY_WINDOW CLOSE
19817 06/07/2012 WEDNESDAY_WINDOW CLOSE
19824 06/07/2012 THURSDAY_WINDOW OPEN
19845 06/08/2012 THURSDAY_WINDOW CLOSE
19850 06/08/2012 FRIDAY_WINDOW OPEN
19871 06/09/2012 FRIDAY_WINDOW CLOSE
19876 06/09/2012 SATURDAY_WINDOW OPEN
19903 06/10/2012 SUNDAY_WINDOW OPEN

LOG_ID TO_CHAR(LO WINDOW_NAME OPERATION
———- ———- ———————————————————— ——————————
19900 06/10/2012 SATURDAY_WINDOW CLOSE
19929 06/11/2012 MONDAY_WINDOW OPEN
19926 06/11/2012 SUNDAY_WINDOW CLOSE
19955 06/12/2012 TUESDAY_WINDOW OPEN
19948 06/12/2012 MONDAY_WINDOW CLOSE
19981 06/13/2012 WEDNESDAY_WINDOW OPEN
19976 06/13/2012 TUESDAY_WINDOW CLOSE
20007 06/14/2012 THURSDAY_WINDOW OPEN
20000 06/14/2012 WEDNESDAY_WINDOW CLOSE
20030 06/15/2012 THURSDAY_WINDOW CLOSE
20033 06/15/2012 FRIDAY_WINDOW OPEN

LOG_ID TO_CHAR(LO WINDOW_NAME OPERATION
———- ———- ———————————————————— ——————————
20052 06/16/2012 FRIDAY_WINDOW CLOSE
20059 06/16/2012 SATURDAY_WINDOW OPEN
20083 06/17/2012 SATURDAY_WINDOW CLOSE
20086 06/17/2012 SUNDAY_WINDOW OPEN

oracle按照 MONDAY_WINDOW->SUNDAY_WINDOW 一次执行 而在一套新装的11g中 Weeknight window and Weekend window are just for compatibility only

11g

SQL> select window_name,resource_plan,comments from dba_scheduler_windows ;

WINDOW_NAME RESOURCE_PLAN COMMENTS
—————————— —————————— ————————————————————
MONDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Monday window for maintenance tasks
TUESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Tuesday window for maintenance tasks
WEDNESDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Wednesday window for maintenance tasks
THURSDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Thursday window for maintenance tasks
FRIDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Friday window for maintenance tasks
SATURDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Saturday window for maintenance tasks
SUNDAY_WINDOW DEFAULT_MAINTENANCE_PLAN Sunday window for maintenance tasks
WEEKNIGHT_WINDOW Weeknight window – for compatibility only
WEEKEND_WINDOW Weekend window – for compatibility only

在系统出现大量的resmgr:cpu quantum等待时 我们可以采取以下方法

关闭 Resource Manager plan 这个方法xifenfei同学写过一篇故障处理过程 可以参照这篇metalink

对于DBRM来说 oracle通过调用DEFAULT_MAINTENANCE_PLAN来控制系统资源的分配,一般来说通过增加consumer group的最大资源可以减少调度过程
中的资源争用。其实我们同样可以通过disable auto task的方法来避免这个问题。

[oracle@db-41 ~]$ ora params _resource_manager_always_on

NAME VALUE DESCRIPTION
——————————————— ——————– ———————————————————————-
_io_resource_manager_always_on FALSE io resource manager always on
_resource_manager_always_on TRUE enable the resource manager always

对于这个问题 其实有一个更折中的方法 关闭 WEEKNIGHT_WINDOW,WEEKEND_WINDOW 禁止 Space Advisor,Tuning Advisor 但是Gather Stats还是建议不要禁止,对于很多系统而言 准确的统计信息往往更能辅助CBO做出正确的判断 这里给个例子 :

BEGIN
dbms_auto_task_admin.disable(
client_name => ‘sql tuning advisor’,
operation => NULL,
window_name => NULL);
END;
/

BEGIN
dbms_auto_task_admin.disable(
client_name => ‘auto space advisor’,
operation => NULL,
window_name => NULL);
END;
/

begin
sys.dbms_scheduler.set_attribute(name => ‘SYS.xxx_WINDOW’, attribute => ‘repeat_interval’, value => ‘Freq=daily;ByDay=TUE;ByHour=2;ByMinute=0;BySecond=0’);
sys.dbms_scheduler.set_attribute(name => ‘SYS.xxx_WINDOW’, attribute => ‘duration’, value => ‘0 04:00:00’);
end;
/

修改为凌晨2点到6点收集统计信息 同时禁止 sql tuning advisor ,auto space advisor

11g 密码延迟登陆导致 library cache lock

May 24, 2012 11g, oracle No comments

前阵子刚写过11g密码延迟登陆的新特性,11g oracle在登陆密码输入错误时候不会立刻中断这个连接而会递增等待时间,这有可能在失败登陆N次以后达到一个很长的阀值(10s),从而导致session 大量积压导致问题的严重化,今天就碰到了问题,PMS team在转换密钥的时候出现了问题,应用一直狂连数据库,但是由于密码不正确,延迟登陆导致latch无法释放:

[oracle@db-38 ~]$ ora active
17:15:10 up 153 days, 7:09, 4 users, load average: 4.05, 2.06, 1.51

SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME
—— ———- ———————- —————————— ——————– —- —- —————————- — —— ———-
13 PMS_DATA JDBC Thin Client library cache lock 70041699512/69922707 0 1 /05uqdabhzncdc A 712 712
592/5177346

1518 JDBC Thin Client library cache lock 70041699512/69873934 0 23 / A 3326 3325
872/5177347

47 JDBC Thin Client library cache lock 70041699512/69871194 0 512 / A 3265 3264
728/5177347

66 PMS_DATA JDBC Thin Client library cache lock 70041699512/69368085 0 4 /05uqdabhzncdc A 712 712
936/5177346

81 JDBC Thin Client library cache lock 70041699512/69462971 0 65 / A 3265 3264
488/5177347

112 JDBC Thin Client library cache lock 70041699512/69462926 0 287 / A 3265 3264
384/5177347

143 PMS_DATA JDBC Thin Client library cache lock 70041699512/69465605 0 10 /05uqdabhzncdc A 353 353
904/5177346

175 PMS_DATA JDBC Thin Client library cache lock 70041699512/69879432 0 4 /05uqdabhzncdc A 353 353
952/5177346

176 JDBC Thin Client library cache lock 70041699512/69871148 0 35 / A 3265 3264
456/5177347

205 PMS_DATA JDBC Thin Client library cache lock 70041699512/69741301 0 4 /05uqdabhzncdc A 334 334
968/5177346

206 JDBC Thin Client library cache lock 70041699512/69334255 0 8 / A 3203 3203
792/5177347

230 PMS_DATA JDBC Thin Client library cache lock 70041699512/69873526 0 4 /05uqdabhzncdc A 334 334
240/5177346

239 JDBC Thin Client library cache lock 70041699512/69982781 0 899 / A 3203 3203
552/5177347

269 JDBC Thin Client library cache lock 70041699512/69581692 0 8 / A 3203 3203
904/5177347

273 JDBC Thin Client library cache lock 70041699512/69975729 0 #### / A 3203 3203
320/5177347

274 PMS_DATA JDBC Thin Client library cache lock 70041699512/69463830 0 4 /05uqdabhzncdc A 334 334
936/5177346

306 JDBC Thin Client library cache lock 70041699512/69462865 0 662 / A 3203 3203
160/5177347

328 JDBC Thin Client library cache lock 70041699512/69873315 0 299 / A 3265 3264
720/5177347

335 PMS_DATA JDBC Thin Client library cache lock 70041699512/69919546 0 4 /05uqdabhzncdc A 334 334
800/5177346

337 JDBC Thin Client library cache lock 70041699512/69334200 0 #### / A 3203 3203
328/5177347

370 JDBC Thin Client library cache lock 70041699512/69462821 0 341 / A 3203 3203
472/5177347

393 PMS_DATA JDBC Thin Client library cache lock 70041699512/69876983 0 1 /05uqdabhzncdc A 334 334
216/5177346

421 PMS_DATA JDBC Thin Client library cache lock 70041699512/69334639 0 4 /05uqdabhzncdc A 334 334
728/5177346

496 PMS_DATA JDBC Thin Client library cache lock 70041699512/69739562 0 4 /05uqdabhzncdc A 334 334
568/5177346

518 PMS_DATA JDBC Thin Client library cache lock 70041699512/69463187 0 4 /05uqdabhzncdc A 334 334
296/5177346

558 PMS_DATA JDBC Thin Client library cache lock 70041699512/69339321 0 16 /05uqdabhzncdc A 334 333
544/5177346

589 PMS_DATA JDBC Thin Client library cache lock 70041699512/69337381 0 10 /05uqdabhzncdc A 334 333
088/5177346

614 JDBC Thin Client library cache lock 70041699512/70146651 0 917 / A 3203 3203
208/5177347

686 JDBC Thin Client library cache lock 70041699512/70146811 0 209 / A 3326 3325
256/5177347

742 JDBC Thin Client library cache lock 70041699512/69738879 0 509 / A 3203 3203
880/5177347

810 JDBC Thin Client library cache lock 70041699512/70146851 #### #### / A 3326 3326
312/5177347

877 JDBC Thin Client library cache lock 70041699512/69736715 0 830 / A 3326 3325
880/5177347

911 PMS_DATA JDBC Thin Client library cache lock 70041699512/69463812 0 4 /05uqdabhzncdc A 733 733

SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME
—— ———- ———————- —————————— ——————– —- —- —————————- — —— ———-
480/5177346

969 PMS_DATA JDBC Thin Client library cache lock 70041699512/70147770 0 4 /05uqdabhzncdc A 733 733
208/5177346

998 PMS_DATA JDBC Thin Client library cache lock 70041699512/69764944 0 16 /05uqdabhzncdc A 733 733
880/5177346

1026 JDBC Thin Client library cache lock 70041699512/69464853 0 353 / A 3326 3326
984/5177347

1028 PMS_DATA JDBC Thin Client library cache lock 70041699512/69747146 0 16 /05uqdabhzncdc A 353 353
824/5177346

1061 JDBC Thin Client library cache lock 70041699512/69871277 0 2 / A 3265 3264
816/5177347

1089 JDBC Thin Client library cache lock 70041699512/69335003 0 248 / A 3203 3203
432/5177347

14 JDBC Thin Client library cache lock 70041699512/70146744 0 563 / A 3265 3264
072/5177347

1103 JDBC Thin Client library cache lock 70041699512/69871239 0 308 / A 3265 3264
832/5177347

1133 JDBC Thin Client library cache lock 70041699512/69463104 0 131 / A 3448 3448
696/5177347

1163 PMS_DATA JDBC Thin Client library cache lock 70041699512/69339111 0 4 /05uqdabhzncdc A 712 712
248/5177346

1199 PMS_DATA JDBC Thin Client library cache lock 70041699512/69580040 0 1 /05uqdabhzncdc A 712 712
392/5177346

1200 JDBC Thin Client library cache lock 70041699512/69463060 0 275 / A 3448 3448
496/5177347

1261 PMS_DATA JDBC Thin Client library cache lock 70041699512/70147768 0 16 /05uqdabhzncdc A 712 712
072/5177346

1357 PMS_DATA JDBC Thin Client library cache lock 70041699512/69883770 0 1 /05uqdabhzncdc A 712 712
240/5177346

1358 JDBC Thin Client library cache lock 70041699512/69334294 0 221 / A 3326 3325
504/5177347

1383 PMS_DATA JDBC Thin Client library cache lock 70041699512/69335098 0 10 /05uqdabhzncdc A 712 712
912/5177346

1390 JDBC Thin Client library cache lock 70041699512/69990749 0 395 / A 3326 3325
464/5177347

1419 PMS_DATA JDBC Thin Client library cache lock 70041699512/69463259 0 16 /05uqdabhzncdc A 712 712
200/5177346

1423 JDBC Thin Client library cache lock 70041699512/69579445 0 230 / A 3326 3325
616/5177347

1443 PMS_DATA JDBC Thin Client library cache lock 70041699512/69986146 0 4 /05uqdabhzncdc A 712 712
008/5177346

1456 JDBC Thin Client library cache lock 70041699512/69966048 0 23 / A 3326 3325
672/5177347

1487 JDBC Thin Client library cache lock 70041699512/70146756 0 #### / A 3326 3325
696/5177347

1511 PMS_DATA JDBC Thin Client library cache lock 70041699512/69463648 0 4 /05uqdabhzncdc A 712 712
872/5177346

1097 WMS_SH6 plsqldev.exe db file sequential read 14/933621/1 0 0 0dpbfbpz8sp8a/0dpbfbpz8sp8a A 189 14688

查找到对应机器为:xen-staging08-vm13,通过trigger 定位到session信息 与xen-staging08-vm13一致,通过kill 此machine的所有session 压力暂时得到缓解,通过与应用沟通发现今天在打jar包的时候密码出现问题。trigger 如下:

记录用户登录失败触发器:
CREATE OR REPLACE TRIGGER logon_denied_to_alert
AFTER servererror ON DATABASE
DECLARE
message VARCHAR2(168);
ip VARCHAR2(15);
v_os_user VARCHAR2(80);
v_module VARCHAR2(50);
v_action VARCHAR2(50);
v_pid VARCHAR2(10);
v_sid NUMBER;
v_program VARCHAR2(48);
BEGIN
IF (ora_is_servererror(1017)) THEN

— get ip FOR remote connections :
IF upper(sys_context(‘userenv’, ‘network_protocol’)) = ‘TCP’ THEN
ip := sys_context(‘userenv’, ‘ip_address’);
END IF;

SELECT sid INTO v_sid FROM sys.v_$mystat WHERE rownum < 2; SELECT p.spid, v.program INTO v_pid, v_program FROM v$process p, v$session v WHERE p.addr = v.paddr AND v.sid = v_sid; v_os_user := sys_context('userenv', 'os_user'); dbms_application_info.read_module(v_module, v_action); message := to_char(SYSDATE, 'YYYYMMDD HH24MISS') || ' logon denied from ' || nvl(ip, 'localhost') || ' ' || v_pid || ' ' || v_os_user || ' with ' || v_program || ' – ' || v_module || ' ' || v_action; sys.dbms_system.ksdwrt(2, message); END IF; END; / [oracle@db-38 ~]$ tailalert Thu May 24 17:19:24 2012 20120524 171924 logon denied from 10.63.0.118 19195 deploy with JDBC Thin Client ? JDBC Thin Client


通过ADR 的 Oracle Net diagnostic 也可以查找如下信息:

adrci> show alert

Choose the alert log from the following homes to view:

1: diag/tnslsnr/db-38/listener
2: diag/rdbms/wmssh6/wmssh6
Q: to quit

Please select option: 1
Output the results to file: /tmp/alert_8079_47814_listener_5.ado

2012-05-24 17:03:18.734000 +08:00
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54578)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54579)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54577)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54582)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54586)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54581)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54585)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54584)) * establish * wmssh6 * 0
24-MAY-2012 17:03:18 * (CONNECT_DATA=(SID=wmssh6)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=deploy))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.63.0.118)(PORT=54583)) * establish * wmssh6 * 0

可以发现大量登陆信息 配合trigger信息 可以更好的诊断问题。


reference:

Using and Disabling the Automatic Diagnostic Repository (ADR) with Oracle Net for 11g [ID 454927.1]

11G New feature: Failed Logon Delays

April 27, 2012 11g, oracle 4 comments

在pub上看到一个不错的帖子 oracle 11g password delay

对这个特性 做如下说明

Failed Logon Delays

A hacker may attempt a brute force hack to break into your Oracle Database. This is where they try constant logons to the database using some form of a word list. To try to make brute force hacking more difficult, Oracle 11g includes a logon delay that takes effect after the third failed password entry attempt. After the third failed logon attempt, Oracle will incrementally delay subsequent logon or password prompts up to a maximum of 10 seconds. No delay will occur if the logon is successful.

Oracle 11g has added other features that improve security. You can configure actions related to bad packet reception, or the trace action that should occur should a bad packet be received, as well as the maximum number of failed logon attempts that a client can make before it’s connection is dropped. Finally you can opt to enable or disable communication of the release banner to a client when it connects. These are all controlled by the following new parameters:

sec_protocol_error_further_action – This parameter defines the action that should take place in the event that a bad packet is received from a remote system. This parameter can be set to the following options:

CONTINUE – Do not disconnect the client session. This is the default setting.

DROP – Drop the client connection after a specific number of bad packets. This parameter takes an integer argument that defines the number of bad packets that are acceptable.

DELAY – Delay accepting client requests after a bad packet is requested. This parameter takes an integer argument that defines the delay time in seconds.
An example of setting this parameter is seen in this code sample. Note that the parameter sec_protocol_error_further_action is not a dynamic parameter, so you need to use the scope=spfile setting to properly set this parameter:

ALTER SYSTEM SET sec_protocol_error_further_action=’DROP’ scope=spfile;


sec_protocol_error_trace_action – This parameter defines the level of tracing that should occur when bad packets are received. This parameter can be set to the following values:

NONE – No logging occurs.

TRACE – A trace file is generated when bad packets are received. This is the default setting.

LOG – A small logging message is entered in the database alert log.

ALERT – An alert message is sent to the DBA via OEM.
An example of setting this parameter is seen in the following code example. Note that the sec_protocol_error_trace_action parameter is dynamic:

ALTER SYSTEM SET sec_protocol_error_trace_action=’LOG’;

sec_max_failed_login_attempts – This parameter controls the number of authentication attempts that a given client connection can make on the server before the client process is dropped. The default value is 10. The sec_max_failed_login_attempts parameter is not dynamic and an example of it’s use can be set as seen in this code sample:

ALTER SYSTEM SET sec_max_failed_login_attempts=5 scope=spfile;

sec_return_server_release_banner – This parameter indicates if the server banner will be returned to a client connection. Not returning the banner will make hacking a database more difficult since the user will not know which version of the database they are trying to hack. The default value is TRUE and sec_return_server_release_banner is a dynamic parameter. An example of the use of this parameter is seen in this example:

ALTER SYSTEM SET sec_return_server_release_banner=FALSE;