internal

关于Partial write for all MySQL and Oracle

September 2, 2014 Internals, MYSQL, oracle No comments

很久之前看过这个帖子,当时没有细究,这几天看了下mysql相关的internal 突然想起这个问题,跟三四个朋友讨论了下 想把这个问题展开讲一讲。

源帖子内容见:https://community.oracle.com/thread/1087650?start=0&tstart=0

这个帖子的内容为一个老外问oracle如何避免partial write的,也就是大家所说的半个IO问题,IO 在写到一半的时候发生断裂,(瞬间断线,瞬间crash瞬间存储或者机器故障等) 虽然这个情况极少发生,但是在非常规恢复的时候也是经常遇到的,例如某个block内部出现不一致,也就是我们所谓的逻辑坏块。
我仔细的又过了一遍帖子,发现下面的几个回答完全不准确,或者有点张冠李戴的意思。

首先我们看看什么是mysql 的double write:

—–引自 mysqlperformace blog

为什么要用doublewrite?
目的是为了保证出现部分写失效(partial page write)–即数据页写到一半时就出现故障–时的数据安全性。Innodb并不在日志中记录整个数据页,而是使用一种称之为“生理”日志的技术,即日志项中只包含页号、对数据进行的操作(如更新一行记录)和日志序列号等信息。这一日志结构的优点是能够减少写入到日志的数据量,但这也要求要保持页内部的一致性。页的版本号是没关系的,数据页可以是当前版本(这时Innodb(故障恢复时)会跳过对页的更新操作)若是早期版本(这时Innodb将执行更新操作)。但如果页内部数据不一致,恢复将不能进行。

部分写失效
什么是部分写失效,为什么会出现这一现象呢?部分写失效指的是提交给操作系统的写数据页操作只完成了一部分。例如一个16K的Innodb数据页中只有4K被更新了,其余部分数据还是旧的。大部分的部分写失效都在断电时产生,在操作系统崩溃时也可能出现,这是由于操作系统可能将一个写16K数据的操作分割成多个写操作(这可能由文件碎片导致),而在执行这多个写操作的过程中发出的故障。当使用软件RAID技术时,数据页可能恰好跨越分片(stripe)的边界,这时也需要执行多个IO操作,因此也可能导致部分写失效。当使用硬件RAID又没有使用带电池的缓存时,断电时出现部分写失效也是可能的。当发送到磁盘本身只有一个写操作时,理论上硬件可以保证写操作即使在断电时也可以完成,因为驱动器内部应该积累有足够的电量来完成这一操作。但实话说我并不知道事实是不是这样,这很难检测,而且这也不是部分写失效的唯一原因。我所知道的只是部分写失效可能会出现,当Innodb实现 doublewrite功能前,我遇到过很多由于这一原因而导致数据被破坏。

doublewrite如何工作?
你可以将doublewrite看作是在Innodb表空间内部分配的一个短期的日志文件,这一日志文件包含100个数据页。Innodb在写出缓冲区中的数据页时采用的是一次写多个页的方式,这样多个页就可以先顺序写入到doublewrite缓冲区并调用fsync()保证这些数据被写出到磁盘,然后数据页才被定出到它们实际的存储位置并再次调用fsync()。故障恢复时Innodb检查doublewrite缓冲区与数据页原存储位置的内容,若数据页在doublewrite缓冲区中处于不一致状态将被简单的丢弃,若在原存储位置中不一致则从doublewrite缓冲区中还原。

doublewrite缓冲区对MySQL有何影响?
虽然doublewrite要求每个数据页都要被写二次,其性能开销远远小于二倍。写出到doublewrite缓冲区时是顺序写,因此开销很小。 doublewrite同时还可以降低Innodb执行的fsync()操作,即不需要写每个页时都调用一下fsync(),而可以提交多个写操作最后再调用一次fsync()操作,这使得操作系统可以优化写操作的执行顺序及并行使用多个存储设备。但在不使用doublewrite技术时也可以用这些优化,事实上这些优化是与doublewrite同时实现的。因此总体来说,我预计使用doublewrite技术带来的性能开销不会超过5%到10%。

能否禁用doublewrite?
如果你不关心数据一致性(比如使用了RAID0)或文件系统可以保证不会出现部分写失效,你可以通过将innodb_doublewrite参数设置为0还禁用doublewrite。但通常这可能带来更大的麻烦。

这里暂且不讨论为何mysql不开启double write会容易出现断裂的写操作. 在mysql中数据写的单元是以page为单位 1page=16KB 而在oracle中是以block为单位 block可以指定大小。但是写入OS的时候都是以OS block为单位,也就是说如果写入OS block时发生partial write 同样会出现逻辑问题。

这里我们看一个老外的回答:

It’s an interesting question. I think the clue is in the link you provided: “Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent.”

What that’s saying (I think!) is that the contents of the innodb transaction log can only be replayed to datafile pages which are ‘clean’ -and that’s true for Oracle, too. You can’t apply Oracle redo to an Oracle database block that is internally corrupted because some of its consituent “os pages” were written at a different time from others. When such partial writes happen, you get what’s called a “fractured block”, warnings in the alert log …and the data file is regarded as corrupt from that point on.

Oracle’s fix to this potential problem, however, is also hinted at in the article you linked to: “Innodb does not log full pages to the log files”. That’s an interesting sentence because , you see, Oracle does write full pages to the logs! I should immediately qualify that: it only does so when you take a “hot backup” using O/S copy commands -because it’s only then that you have to worry about the problem. In other words, you only have to worry about the fact that you can only apply redo to an internally consistent database block if you’re actually in the business of applying redo… and you’re only doing that in the event of a recovery. And complete recoveries in Oracle (as opposed to mere crash recoveries) require you to have restored something from backup. So, it’s only during the backup process that you only have to worry about the problem of fractured blocks -and so it’s only then that Oracle says, ‘if you have put the tablespace into hot backup mode (alter tablespace X begin backup), then the first time a block of data is changed, the entire block should be written in a consistent state into the redo (transaction) logs. Then, if the datafile copy of the block in the backup turns out to be fractured, we’ve got a known good copy in the redo we can restore in its place. And once you have a clean block as a starting point, you can continue to apply redo from that point on’.

Oracle has an alternative (and more up to date) mechanism for achieving this “I know your data block is clean” starting state, though. It’s called RMAN -the Oracle backup and recovery tool. Unlike your OS copy command, it’s an Oracle utility… so it understands the concept of Oracle blocks, and it can therefore check that a block that’s been copied has been copied consistently, with all its constituent OS ‘pages’ written coherently to disk in the same state. It knows how to compare input and output in a way no OS command could ever hope to do. So when RMAN copies a data block hot, it reads the copy, compares it with the original -and if it sees the copy is fractured, it just has another go copying the block again. Repeat until the copy is indeed verified as a good copy of the original. No need to write the block into the transaction log at all, because you know that the backup file itself contains the necessary clean block copy.

So, putting that into practice. Let’s say your server corrupts data on the disk for whatever reason and, in the process, your Oracle instance dies. You try and restart Oracle, but you get told that recovery is needed (you might get a message that file 16, for example, can’t be read). So you restore file 16 from your hot backup taken with OS commands. In that backup, one of the blocks is fractured, because only part of the Oracle block had hit disk at the point the backup was performed. So you restore a fractured block. But that’s not a problem, because as redo is replayed, you’ll find the clean copy of the block in the redo stream, and restore that over the top of the fractured block. The rest of the redo can then be replayed without a problem. Or, you restore file 16 using RMAN… and what it restores cannot be fractured, because it checks for that before it reports the original backup a success. Therefore, you restore a clean copy of file 16, and can apply redo to it without drama. Either way, you get your database recovered.

So, the article you linked to nails the important matter: “It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.” Absolutely true of Oracle, too. But Oracle has two alternatives for ensuring that a clean version of the block is always available: write a whole block into redo if it’s changed whilst the database is being backed up with OS commands, or make sure you only write clean blocks into the backup if you’re using RMAN -and you achieve that by multiple reads of the block, as many as are necessary to ensure the output is clean.

Oracle’s solutions in these regards are, I think, a lot more efficient than double-writing every block all the time, because the only time you have to worry that what’s on disk isn’t consistent is, as your linked article again points out, when ‘power failure’ or ‘os crash’ happens. That is, during some sort of failure. And the response to failure that involves corruption is always to restore something from backup… so, it’s really only that backup that needs to worry about ‘clean pages’. Instead of writing everything twice to disk during normal running (which sounds like a potentially enormous overhead to me!), therefore, Oracle only has to employ protective measures during the backup process itself (which should, ordinarily, be a mere fraction of ‘normal running’ time). The overhead is therefore only encountered sparingly and not something you need worry about as a potentially-constant performance problem.

In closing, I’ll second Aman’s observation that it is generally and usually the case that any variation away from the default 8K block size is a bad idea. Not always, and there may be justification for it in extremis… but you will certainly be at risk of encountering more and weirder bugs than if you stick to the defaults.

这一段回答可谓一点儿也没讲到重点 主要回答oracle 采用避免partial write的几个方法,注意 这里是在特殊场景下 如何避免partial write.而不是数据库机理. 我们看一下mysql怎么同样避免这个问题 —> ( oracle hot backup/RMAN backup)
传统Oracle的热备,备份读取与DBWR写文件并行执行,因此可能读取到Fractured Block(不一致的块),解决办法是对于备份的文件,DBWR写的Dirty Page同时写入Redo Log,用于Fractured Block的恢复。RMAN备份,会检查每个读取页面的一致性,不一致就重新读取。Percona的XtraBackup,采用类似于RMAN的方案。
如何检查备份读取的页面是否是一致的,其实方法很简单:无论是Oracle/InnoDB的页面,都会在页面的页头与页尾存储相同的SCN /CheckSum。当页头/页尾的SCN/CheckSum相同,说明页面处于一致状态,否则就说明页面损坏(Fractured Block),备份重新读取损坏页面即可。
所以这一段可以理解为当你需要备份的时候 如何避免partial write 因为在这个时候最容易发生断裂的块或者页。而此前别人提问的是oracle是否有类似double write的手法避免常规的partial write.

我们继续看下面的回答:

user646745 wrote:
Thanks HJR for detailed analysis.

But the double write mechanism works without restore from a backup an apply redo: before write the dirty buffer blocks, innodb flush the blocks in the double write buffer to disk which is a sequential disk area (so it’s fast),
————————-
before write the dirty buffer blocks, logwr flush the blocks in the redo buffer to disk which is a sequential disk area (so it’s fast),

so even if partial write happen, the the blocks in double write buffer already persistent in disk,
————————-
so even if partial write happen, the the blocks in redo buffer already persistent in disk,

and when mysql restart, innodb can compare the blocks flushed from double write buffer with the datafile blocks,
————————-
and when mysql restart, smon can compare control file scn with the datafile blocks scn,

if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just overwrite the datafile blocks with the blocks from double write buffer.
————————-
if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just apply the redo from the redo logs.

So no media recover is required.
————————-
sounds like media recovery to me

Based on your anaysis, oracle needs media recover.
————————-
Based on your analysis, so does mysql. It just applies it in a very slightly different fashion, and calls it all something else.

这里的回答又有误点. 他说道 “so even if partial write happen, the the blocks in double write redo buffer already persistent in disk” 这句话存在明显的误导,首先partial write发生的时候,redo是无法恢复
一个块内不一致的块的,redo只能负责recover作用,但这不是针对块内部的恢复 而是trasaction或者media的.

oracle recovery 分为 instance/crash recovery and media recovery 其本质区别在于instance recovery 需要online redo logfile 即apply 增量检查点之后的redolog. 而media recovery又分为 datafile /block media
recovery 其本质为恢复一个旧的datafile或者block 可能需要用到归档日志。注意这里的前提为需要有备份。restore datafile /block 只能在备份中恢复,而之后的recover是要用到archive log。 这里的media recover可以
对应为解决partial write的方法。

对于一个内部不一致的page mysql无法通过redo去恢复 :redo有效应用的前提是data的一致性,当data flush到磁盘时发生故障,比如16K只写了4K并且redo条目是change vector形式,属于逻辑记录,那么在InnoDB recovery时,redo就不知道从哪开始跑起。

对于oracle来说 内部不一致的block.仅仅有 redo也是不够的.我们可以通过dbf检查到逻辑问题的block,去repair这个block,前提是需要有备份。这里用到的方法是media recovery里的block recovery.
我们可以使用bbed 或者一些OS的工具来模拟这些逻辑坏块,并且往往发现数据库是可以正常open的,只有在访问到这些block的时候才会出现逻辑问题。所以说备份重于一切,oracle在处理这些”半个IO”或者一些静默丢失(storage),在没有备份的情况下也是很乏力的。

How to control CF calculation

May 14, 2013 Internals, oracle No comments

根据Richard Foote的文章,做了如下验证:

RefClustering Factor Calculation Improvement and provide some control over DBMS_STATS index clustering factor computation

[oracle@db-42 ~]$ unzip p13262857_112030_Generic.zip 
[oracle@db-42 ~]$ cd 13262857/
[oracle@db-42 13262857]$ ora si 

SQL*Plus: Release 11.2.0.3.0 Production on Tue May 14 10:44:32 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options


[oracle@db-42 13262857]$ /data/app1/oracle/product/11.2.0/db_1/OPatch/opatch apply
Invoking OPatch 11.2.0.1.7

Oracle Interim Patch Installer version 11.2.0.1.7
Copyright (c) 2011, Oracle Corporation.  All rights reserved.


Oracle Home       : /data/app1/oracle/product/11.2.0/db_1
Central Inventory : /data/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.7
OUI version       : 11.2.0.3.0
Log file location : /data/app1/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2013-05-14_10-46-13AM.log

Applying interim patch '13262857' to OH '/data/app1/oracle/product/11.2.0/db_1'
Verifying environment and performing prerequisite checks...

Do you want to proceed? [y|n]
yes
User Responded with: Y
All checks passed.
Backing up files...

Patching component oracle.rdbms.dbscripts, 11.2.0.3.0...
Patch 13262857 successfully applied
Log file location: /data/app1/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2013-05-14_10-46-13AM.log

OPatch succeeded.

[oracle@db-42 13262857]$ ora si 

SQL*Plus: Release 11.2.0.3.0 Production on Tue May 14 10:46:25 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade;
ORACLE instance started.

Total System Global Area 8017100800 bytes
Fixed Size		    2244192 bytes
Variable Size		 1828716960 bytes
Database Buffers	 6157238272 bytes
Redo Buffers		   28901376 bytes
Database mounted.
Database opened.
SQL> 
SQL> @?/rdbms/admin/dbmsstat.sql

Package created.

No errors.

Synonym created.


Grant succeeded.


Role created.


Grant succeeded.


Grant succeeded.


Library created.

SQL> 
SQL> @?/rdbms/admin/prvtstat.plb

Package body created.

No errors.
SQL> 
SQL> @?/rdbms/admin/execstat.sql

PL/SQL procedure successfully completed.

SQL> 
SQL> 
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> 
SQL> 
SQL> startup
ORACLE instance started.

Total System Global Area 8017100800 bytes
Fixed Size		    2244192 bytes
Variable Size		 1828716960 bytes
Database Buffers	 6157238272 bytes
Redo Buffers		   28901376 bytes
Database mounted.
Database opened.

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.


SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';  2  


TABLE_NAME		       INDEX_NAME			  BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
BOWIE			       BOWIE_ID_I			    1126     300000		 1035

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.


SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';  2  

TABLE_NAME		       INDEX_NAME			  BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
BOWIE			       BOWIE_ID_I			    1126     300000		 1035

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',pname=>'TABLE_CACHED_BLOCKS', pvalue=>1);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';  2  

TABLE_NAME		       INDEX_NAME			  BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
BOWIE			       BOWIE_ID_I			    1126     300000	       193228

eg: 重复一遍这个过程

SQL> truncate table bowie;

Table truncated.

SQL> exec bowie_proc 

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',pname=>'TABLE_CACHED_BLOCKS', pvalue=>1);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> set autot off;
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';  2  

TABLE_NAME		       INDEX_NAME			  BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
BOWIE			       BOWIE_ID_I			    1126     300000	       357685


SQL> set autot trace ;

SQL> select * from bowie where id between 42 and 429;

388 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation	  | Name  | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	  |   389 |  7780 |   310   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   389 |  7780 |   310   (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"<=429 AND "ID">=42)


Statistics
----------------------------------------------------------
	  9  recursive calls
	  1  db block gets
 1532  consistent gets
	  0  physical reads
	  0  redo size
 8659  bytes sent via SQL*Net to client
	799  bytes received via SQL*Net from client
	 27  SQL*Net roundtrips to/from client
	  4  sorts (memory)
	  0  sorts (disk)
	388  rows processed

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.


SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';  2  

TABLE_NAME		       INDEX_NAME			  BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
BOWIE			       BOWIE_ID_I			    1126     300000		 1396

SQL> set autot trace ;
SQL> select * from bowie where id between 42 and 429;

Execution Plan
----------------------------------------------------------
Plan hash value: 3472402785

------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	 | Rows  | Bytes | Cost (%CPU)| Time	 |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		 |   389 |  7780 |     7   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE	 |   389 |  7780 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN	    | BOWIE_ID_I |   519 |	 |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

Statistics
----------------------------------------------------------
	  1  recursive calls
	  0  db block gets
	 57  consistent gets
	  0  physical reads
	  0  redo size
14532  bytes sent via SQL*Net to client
	799  bytes received via SQL*Net from client
	 27  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	388  rows processed

Active dataguard tips

January 23, 2013 oracle, replication No comments

11g ADG 给了大家更多的灾备选择,包括之前美国飓风,给了ADG大放光彩的机会,但是ADG并没有想象中的那么完美,以下是最近的一个CASE,具体过程不详写了,目前这个case还在跟进中,记录之:

[oracle@dcb-srv-0174 trace]$ cat itemstd02_ora_22156.trc |grep "LibraryObjectLock" |awk '{print $3 "-- obj handle get"}' | uniq -c |awk  '{ if ($1>100) print $0}'
    107 Handle=0x2efee6a060-- obj handle get
    107 Handle=0x2eff322578-- obj handle get
    113 Handle=0x2eff54acf0-- obj handle get
    118 Handle=0x2f1f0b89c8-- obj handle get
    118 Handle=0x2f1fc813c0-- obj handle get
    125 Handle=0x2f3e772c58-- obj handle get
    127 Handle=0x2f1fccd750-- obj handle get
    128 Handle=0x2f7efaa2a0-- obj handle get
    129 Handle=0x2f7f831ac0-- obj handle get
    130 Handle=0x2f3e171640-- obj handle get
    130 Handle=0x2f7ee7a100-- obj handle get
    131 Handle=0x2f7ee7a268-- obj handle get
    132 Handle=0x2f7f855570-- obj handle get
    147 Handle=0x2f1a8f0bf8-- obj handle get
    147 Handle=0x2f3edd63d0-- obj handle get
    150 Handle=0x2f3edd6620-- obj handle get
    150 Handle=0x2f5ef05050-- obj handle get
    150 Handle=0x2f5f213530-- obj handle get
    154 Handle=0x2f3e81b988-- obj handle get
    159 Handle=0x2f3e926498-- obj handle get
    182 Handle=0x2eff03c3c0-- obj handle get
    190 Handle=0x2f1f5a2858-- obj handle get
    192 Handle=0x2f1f859808-- obj handle get
    195 Handle=0x2fbf7f46f8-- obj handle get
    195 Handle=0x2fbfe9eb80-- obj handle get
    197 Handle=0x2f5ec64840-- obj handle get
    200 Handle=0x2eff4e7698-- obj handle get
    200 Handle=0x2f5f54e510-- obj handle get
    202 Handle=0x2f1fee7520-- obj handle get
    203 Handle=0x2eff975f98-- obj handle get
    204 Handle=0x2eff0480e0-- obj handle get
    204 Handle=0x2efff29160-- obj handle get
    207 Handle=0x2f7eedaae0-- obj handle get
    208 Handle=0x2f3fd32ef0-- obj handle get
    208 Handle=0x2f3fd33510-- obj handle get
    209 Handle=0x2f3fd33ef0-- obj handle get
    209 Handle=0x2f3fd34140-- obj handle get
    211 Handle=0x2f1f9ed050-- obj handle get
    211 Handle=0x2f1fee7770-- obj handle get
    212 Handle=0x2f1fc7ee38-- obj handle get
    212 Handle=0x2f9fa5a770-- obj handle get
    213 Handle=0x2efff297e0-- obj handle get
    213 Handle=0x2f7fa54a28-- obj handle get
    218 Handle=0x2f9fce2748-- obj handle get
    231 Handle=0x2eff04a950-- obj handle get
    247 Handle=0x2ef65d8a88-- obj handle get
    264 Handle=0x2effd1c938-- obj handle get
    268 Handle=0x2f3ee0e1a8-- obj handle get
    268 Handle=0x2f3f6469a0-- obj handle get
    270 Handle=0x2f1f8a6120-- obj handle get
    270 Handle=0x2f1fcda868-- obj handle get
    290 Handle=0x2f7f85ade0-- obj handle get
    378 Handle=0x2f7f85b030-- obj handle get
    468 Handle=0x2f3f28ec08-- obj handle get
    697 Handle=0x2f3fd326e0-- obj handle get
    800 Handle=0x2efca0bdd0-- obj handle get
    847 Handle=0x2f9ff0e448-- obj handle get
[oracle@dcb-srv-0174 trace]$ cat temp.log |grep "library cache load lock" |wc -l && cat temp.log |grep  "cursor: pin S" |wc -l
619
158

..........


ADDR	00002B4642274E78
INDX	51844
INST_ID	1
KGLHDADR	0000002F3F28EC08
KGLHDPAR	0000002F3F28EC08
KGLHDCLT	65657
KGLNAOWN	SYS
KGLNAOBJ	TRG_SERVER_ERRORS
KGLFNOBJ	<CLOB>
KGLNADLK	
KGLNAHSH	102957177
KGLNAHSV	e230e86a06763591475191a006230079
KGLNATIM	27-10月-12
KGLNAPTM	
KGLHDNSP	3
KGLHDNSD	TRIGGER
KGLHDLMD	0
KGLHDPMD	0
KGLHDFLG	10241
KGLHDOBJ	0000002C9F3EA470
KGLHDLDC	1
KGLHDIVC	0
KGLHDEXC	0
KGLHDLKC	0
KGLHDKMK	0
KGLHDDMK	67
KGLHDAMK	0
KGLOBFLG	4
KGLOBSTA	1
KGLOBTYP	12
KGLOBTYD	TRIGGER
KGLOBHMK	0
KGLOBHS0	4712
KGLOBHS1	2480
KGLOBHS2	0
KGLOBHS3	0
KGLOBHS4	0
KGLOBHS5	0
KGLOBHS6	1032
KGLOBHS7	0
KGLOBHD0	0000002F3EE199D8
KGLOBHD1	0000002C9F3EABA0
KGLOBHD2	00
KGLOBHD3	00
KGLOBHD4	00
KGLOBHD5	00
KGLOBHD6	0000002C9F3EAC28
KGLOBHD7	00
KGLOBPC0	0
KGLOBPC6	0
KGLOBTP0	00
KGLOBT00	0
KGLOBT01	0
KGLOBT02	0
KGLOBT03	
KGLOBT04	0
KGLOBT05	0
KGLOBT35	0
KGLOBT06	0
KGLOBT07	0
KGLOBT08	0
KGLOBT09	0
KGLOBT10	0
KGLOBT11	0
KGLOBT12	0
KGLOBT13	0
KGLOBT14	0
KGLOBT15	0
KGLOBT16	0
KGLOBT17	0
KGLOBT18	0
KGLOBT19	0
KGLOBT20	0
KGLOBT21	0
KGLOBT22	0
KGLOBT23	2382
KGLOBT24	1795
KGLOBT25	0
KGLOBT26	0
KGLOBT28	0
KGLOBT29	0
KGLOBT30	0
KGLOBT31	0
KGLOBT27	0
KGLOBT32	0
KGLOBT33	0
KGLOBWAP	0
KGLOBWCC	0
KGLOBWCL	0
KGLOBWUI	0
KGLOBWDW	0
KGLOBT42	0
KGLOBT43	0
KGLOBT44	0
KGLOBT45	0
KGLOBT46	0
KGLOBT47	0
KGLOBT49	0
KGLOBT50	0
KGLOBT52	0
KGLOBT53	0
KGLOBTL0	0
KGLOBTL1	0
KGLOBTS0	
KGLOBTS1	
KGLOBTN0	
KGLOBTN1	
KGLOBTN2	
KGLOBTN3	
KGLOBTN4	
KGLOBTN5	
KGLOBTS2	
KGLOBTS3	
KGLOBTS5	
KGLOBTT0	
KGLOBCCE	
KGLOBCCEH	0
KGLOBCLA	
KGLOBCLC	0
KGLOBCCC	0
KGLOBTS4	
KGLOBCBCA	
KGLOBT48	0
KGLOBDSO	0
KGLOBDEX	0
KGLOBDPX	0
KGLOBDLD	0
KGLOBDIV	0
KGLOBDPS	0
KGLOBDDR	0
KGLOBDDW	0
KGLOBDBF	0
KGLOBDRO	0
KGLOBDCP	0
KGLOBDEL	0
KGLOBDFT	0
KGLOBDEF	0
KGLOBDUI	0
KGLOBDCL	0
KGLOBDAP	0
KGLOBDCC	0
KGLOBDPL	0
KGLOBDJV	0
KGLOBACS	0
KGLOBTS6	
KGLOBTS7	
KGLOBT54	0
KGLOBT55	0
KGLOBT56	0
KGLOBT57	0
KGLOBDCO	0
KGLOBDCI	0
KGLOBDRR	0
KGLOBDRB	0
KGLOBDWR	0
KGLOBDWB	0
KGLOBT58	0
KGLOBDOR	0
KGLHDMTX	0000002F3F28ED38
KGLHDMVL	0
KGLHDMSP	558
KGLHDMGT	667484
KGLHDDMTX	0000002F3F28ECB8
KGLHDDVL	0
KGLHDDSP	0
KGLHDDGT	0
KGLHDBID	65657
KGLHDBMTX	0000002E3546D210
KGLHDBVL	0
KGLHDBSP	3
KGLHDBGT	561277
KGLOBT59	0
KGLOBDCU	0
KGLOBPROP

问题SQL就不贴出来了,涉及到3张表的FTS,并发上来之后导致ADG 短暂的hang 死,当时的状态包括了很多类library cache lock(library cache load lock 与 cursor pin S) 的交互等待 。

对于这里涉及到的机制:

The connection between log file switches and the media recovery checkpoints is just a little messier than I’ve described, and 11g has introduced two new parameters, _defer_log_boundary_ckpt (defer media recovery checkpoint at log boundary) and _defer_log_count (number of log boundaries media recovery checkpoint lags behind), defining this behavior. Effectively these parameters allow Oracle to be lazy about copying buffers to disk on media recovery checkpoints until a few extra log files switches have taken place. This type of mechanism has been in place since 9i—but with no visible controls—presumably in the hope that a little extra delay at peak processing periods would allow incremental checkpointing to spread the write load that an immediate media recovery checkpoint would cause.

Mandatory checkpoint at every log boundary

Recovery slaves wait on log boundary checkpoint

No redo apply during log boundary checkpoint

Incremental recovery checkpoints keep log boundary checkpoint duration smaller Checkpoint performed prior to stopping MRP

Parameters “_defer_log_boundary_ckpt” and “_defer_log_count” define the behavior (not fully implemented in 11g)

Recovery until consistent required after crash of ADG

关于另外几个参数的说明:

_row_cache_cursors:参考 Steve Adams的Tuning the _row_cache_cursors

session_cached_cursor : 设置此参数用以缓解 library cache load lock.

_row_cr : SET FALSE 用以缓解 row CR 对于缓解dc_rollback_segments效果不明

_defer_log_count 针对 ADG target 端的checkpoint

_log_committime_block_cleanout 目前没有具体影响范围. (When it is set to TRUE then we will try to perform block cleanout when a transaction commits rather than applying delayed logging cleanout to the affected blocks.The commit causes us to walk the associated list of blocks state objects owned by the transaction and perform the relevant form of cleanout upon each block depending upon the parameter’s setting)

delayed logging behavior by JL:

When you commit, the commit cleanout updates some of the changed data blocks by stamping them with a couple of SCNs. But the commit still takes only 60 bytes of redo – the changes to the data blocks are not logged i.e. they are delayed.When the block is subsequently updated by another transaction, the ‘full’ cleanout of the ITL and its row locks takes place. However, if you examine the redo generated at this point, you will find that it has been engineered to look as if the new update started from a fully cleaned out block. In this respect, the logging of the commit cleanout has not been delayed, it has been totally bypassed.

Also by Steve : Read Consistency

_db_lost_write_checking and db_lost_write_protect can also change “_log_committime_block_cleanout”

Set “_log_committime_block_cleanout” to TRUE on primary if possible, redo overhead is less than 5%. (by paypal)

对于这个Case,目前采用TNS HA连接方式轮询多个可用的ADG,配合APP的自动failover.目前ADG 对于大并发垃圾SQL的支持是有一些问题,包括一些row cache latch 的竞争十分的激烈.可以考虑采用logical standby 或者ogg 来满足一些异步大SQL的查询。

Capture binds always actually ?

August 28, 2012 Internals, oracle No comments

我们可以通过v$sql_bind_capture 捕获到 bind values 这篇文章主要阐述v$sql_bind_capture与v$sql_plan的一些联系

SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 28 14:07:13 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size		    2235208 bytes
Variable Size		  889193656 bytes
Database Buffers	 3372220416 bytes
Redo Buffers		   12132352 bytes
Database mounted.
Database opened.
SQL> drop table liu;

Table dropped.

SQL> conn liu/liu
Connected.
SQL> select * from liu;

       ID1	  ID2	     ID3
---------- ---------- ----------
	13	   14	      15
	 7	    8	       9
	10	   11	      12
	 1	    2	       3
	 4	    5	       6

SQL> variable N1 number;
variable N2 number;
variable N3 number;
begin
 :N1 := 1; 
 :N2 := 2;
 :N3 := 3;
 END;
 /
print N1;
print N2;
print N3; SQL> SQL> SQL>   2    3    4    5    6  
PL/SQL procedure successfully completed.

SQL> 
	N1
----------
	 1

SQL> 
	N2
----------
	 2

SQL> 

	N3
----------
	 3

SQL> SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3;

       ID1	  ID2	     ID3
---------- ---------- ----------
	 1	    2	       3

SQL> @find_sql
Enter value for sql_text: %FROM LIU WHERE ID1%
old   6: where upper(sql_text) like upper(nvl('&sql_text',sql_text))
new   6: where upper(sql_text) like upper(nvl('%FROM LIU WHERE ID1%',sql_text))
Enter value for sql_id: 
old   8: and sql_id like nvl('&sql_id',sql_id)
new   8: and sql_id like nvl('',sql_id)

SQL_ID					CHILD_NUMBER  PLAN_HASH      EXECS  AVG_ETIME	 AVG_LIO SQL_TEXT
--------------------------------------- ------------ ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------
2y32qkr40b82a					   0 3882242470 	 1    .012357	      47 SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3


SQL> select * from table(dbms_xplan.display_cursor('2y32qkr40b82a',null,'typical +peeked_binds'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID	2y32qkr40b82a, child number 0
-------------------------------------
SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3

Plan hash value: 3882242470

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	 |	 |     3 (100)| 	 |
|*  1 |  TABLE ACCESS FULL| LIU  |     1 |    39 |     3   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :N1 (NUMBER): 1
   2 - :N2 (NUMBER): 2
   3 - :N3 (NUMBER): 3

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(("ID1"=:N1 AND "ID2"=:N2 AND "ID3"=:N3))

Note
-----
   - dynamic sampling used for this statement (level=2)


29 rows selected.

SQL> select value_string from v$sql_bind_capture where sql_id='2y32qkr40b82a' 
  2  ;

VALUE_STRING
--------------------------------------------------------------------------------
1
2
3

继续改变bind values :

SQL> variable N1 number;
variable N2 number;
variable N3 number;
begin
 :N1 := 4; 
 :N2 := 5;
 :N3 := 6;
 END;
 /
print N1;
print N2;
print N3;SQL> SQL> SQL>   2    3    4    5    6  
PL/SQL procedure successfully completed.

SQL> 
	N1
----------
	 4

SQL> 
	N2
----------
	 5

SQL> 

	N3
----------
	 6

SQL> SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3;

       ID1	  ID2	     ID3
---------- ---------- ----------
	 4	    5	       6
	 
SQL> select * from table(dbms_xplan.display_cursor('2y32qkr40b82a',null,'typical +peeked_binds'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID	2y32qkr40b82a, child number 0
-------------------------------------
SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3

Plan hash value: 3882242470

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	 |	 |     3 (100)| 	 |
|*  1 |  TABLE ACCESS FULL| LIU  |     1 |    39 |     3   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :N1 (NUMBER): 1
   2 - :N2 (NUMBER): 2
   3 - :N3 (NUMBER): 3

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(("ID1"=:N1 AND "ID2"=:N2 AND "ID3"=:N3))

Note
-----
   - dynamic sampling used for this statement (level=2)


29 rows selected.


SQL>  select value_string from v$sql_bind_capture where sql_id='2y32qkr40b82a' ;

VALUE_STRING
--------------------------------------------------------------------------------
1
2
3

没有变化,继续强制产生一个child cursor

SQL> insert into liu values (128000,128000,128000);

1 row created.

SQL> variable N1 number;
variable N2 number;
variable N3 number;
begin
 :N1 := 128000; 
 :N2 := 128000;
 :N3 := 128000;
 END;
 /SQL> SQL> SQL>   2    3    4    5    6  

PL/SQL procedure successfully completed.

SQL> alter session set  optimizer_mode=first_rows_100;

Session altered.

SQL> SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3;

       ID1	  ID2	     ID3
---------- ---------- ----------
    128000     128000	  128000
    

Enter value for sql_text: %FROM LIU WHERE ID1%
old   6: where upper(sql_text) like upper(nvl('&sql_text',sql_text))
new   6: where upper(sql_text) like upper(nvl('%FROM LIU WHERE ID1%',sql_text))
Enter value for sql_id: 
old   8: and sql_id like nvl('&sql_id',sql_id)
new   8: and sql_id like nvl('',sql_id)

SQL_ID					CHILD_NUMBER  PLAN_HASH      EXECS  AVG_ETIME	 AVG_LIO SQL_TEXT
--------------------------------------- ------------ ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------
2y32qkr40b82a					   0 3882242470 	 1    .002613	      23 SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3
2y32qkr40b82a					   1 3882242470 	 2    .006787	    65.5 SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3


SQL> select * from table(dbms_xplan.display_cursor('2y32qkr40b82a',null,'typical +peeked_binds'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID	2y32qkr40b82a, child number 0
-------------------------------------
SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3

Plan hash value: 3882242470

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	 |	 |     3 (100)| 	 |
|*  1 |  TABLE ACCESS FULL| LIU  |     1 |    39 |     3   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - :N1 (NUMBER): 128000
   2 - :N2 (NUMBER): 128000
   3 - :N3 (NUMBER): 128000

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(("ID1"=:N1 AND "ID2"=:N2 AND "ID3"=:N3))

Note
-----
   - dynamic sampling used for this statement (level=2)

SQL_ID	2y32qkr40b82a, child number 1
-------------------------------------
SELECT * FROM LIU WHERE ID1= :N1 AND ID2= :N2 AND ID3= :N3


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3882242470

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |	 |	 |     3 (100)| 	 |
|*  1 |  TABLE ACCESS FULL| LIU  |     1 |    39 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - :N1 (NUMBER): 1
   2 - :N2 (NUMBER): 2
   3 - :N3 (NUMBER): 3

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("ID1"=:N1 AND "ID2"=:N2 AND "ID3"=:N3))

Note

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----
   - dynamic sampling used for this statement (level=2)


58 rows selected.


SQL> select value_string from v$sql_bind_capture where sql_id='2y32qkr40b82a'

VALUE_STRING
--------------------------------------------------------------------------------
1
2
3
128000
128000
128000

6 rows selected.

在产生子游标的情况下同时捕获到,我们来看下other_xml的值

select other_xml from v$sql_plan where sql_id=’2y32qkr40b82a’ and other_xml is not null

<other_xml><info type="db_version">11.2.0.3</info>
<info type="parse_schema"><![CDATA["LIU"]]></info><info type="dynamic_sampling">2</info>
<info type="plan_hash">3882242470</info><info type="plan_hash_2">281948136</info>
<peeked_binds><bind nam=":N1" pos="1" dty="2" pre="0" scl="0" mxl="22">c30d51</bind>
<bind nam=":N2" pos="2" dty="2" pre="0" scl="0" mxl="22">c30d51</bind>
<bind nam=":N3" pos="3" dty="2" pre="0" scl="0" mxl="22">c30d51</bind></peeked_binds>
<outline_data><hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint>
<hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]]></hint>
<hint><![CDATA[DB_VERSION('11.2.0.3')]]></hint>
<hint><![CDATA[FIRST_ROWS(100)]]></hint>
<hint><![CDATA[OUTLINE_LEAF(@"SEL$1")]]></hint>
<hint><![CDATA[FULL(@"SEL$1" "LIU"@"SEL$1")]]></hint>
</outline_data></other_xml>
SQL> select display_raw('c30d51','NUMBER') from dual;

DISPLAY_RAW('C30D51','NUMBER')
--------------------------------------------------------------------------------
128000

确实在other_xml中捕获到了这个值,这里要注意如果关闭”_optim_peek_user_binds” other_xml将无法捕获这些values

继续 我们开启两个窗口做loop循环:

session 1:

SQL>  declare 
N1  number;
N2  number;
N3  number;
v_count number;
begin
for i in 1..1000000000 loop
N1 := 4; 
N2 := 5;
N3 := 6;
SELECT count(*) into v_count FROM LIU WHERE ID1= N1 AND ID2= N2 AND ID3= N3;
end loop;
end;  2    3    4    5    6    7    8    9   10   11   12   13  
 14  /
 
 
session 2:

SQL>  declare 
N1  number;
N2  number;
N3  number;
v_count number;
begin
for i in 1..1000000000 loop
N1 := 7; 
N2 := 8;
N3 := 9;
SELECT count(*) into v_count FROM LIU WHERE ID1= N1 AND ID2= N2 AND ID3= N3;
end loop;
end;  2    3    4    5    6    7    8    9   10   11   12   13  
 14  /
 
session 3

SQL> @find_sql
Enter value for sql_text: %LIU%
old   6: where upper(sql_text) like upper(nvl('&sql_text',sql_text))
new   6: where upper(sql_text) like upper(nvl('%LIU%',sql_text))
Enter value for sql_id: 
old   8: and sql_id like nvl('&sql_id',sql_id)
new   8: and sql_id like nvl('',sql_id)

SQL_ID					CHILD_NUMBER  PLAN_HASH      EXECS  AVG_ETIME	 AVG_LIO SQL_TEXT
--------------------------------------- ------------ ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------
33xdcqmh004pw					   0 4104423748     889189 .000024552 7.00001574 SELECT COUNT(*) FROM LIU WHERE ID1= :B3 AND ID2= :B2 AND ID3= :B1
3s3f5r30shvz4					   0 4104423748 	 1    .003918	       7 SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB
												 ) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB)
												 NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAU
												 SE NO_PARALLEL("LIU") FULL("LIU") NO_PARALLEL_INDEX("LIU") */ 1 AS C1, CASE WHEN
												  "LIU"."ID1"=:B1 AND "LIU"."ID2"=:B2 AND "LIU"."ID3"=:B3 THEN 1 ELSE 0 END AS C2
												  FROM "LIU"."LIU" "LIU") SAMPLESUB

g8h776pgzzxym					   0	      0 	 0   41.31562	 6161555  declare N1  number; N2  number; N3  number; v_count number; begin for i in 1..1
												 000000000 loop N1 := 4; N2 := 5; N3 := 6; SELECT count(*) into v_count FROM LIU
												 WHERE ID1= N1 AND ID2= N2 AND ID3= N3; end loop; end;

SQL_ID					CHILD_NUMBER  PLAN_HASH      EXECS  AVG_ETIME	 AVG_LIO SQL_TEXT
--------------------------------------- ------------ ---------- ---------- ---------- ---------- --------------------------------------------------------------------------------


SQL> alter system set "_cursor_bind_capture_interval"=1;

System altered.

15:19:38 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
4
5
6

Elapsed: 00:00:00.00
15:19:40 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
7
8
9

Elapsed: 00:00:00.01
15:19:41 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
7
8
9

Elapsed: 00:00:00.00
15:20:58 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
7
8
9

Elapsed: 00:00:00.00
15:20:59 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
7
8
9

Elapsed: 00:00:00.00
15:20:59 SQL> /

VALUE_STRING
--------------------------------------------------------------------------------
4
5
6

可以看到Capture binds 在不停的变化,而此时通过dbms_xplan peeking binds 则不会产生变化,这也验证了JL的这篇文章。
这篇文章的主旨为 “Capture binds always actually?”.通过实验已经证明了 – No 对于这个问题衍生出了peeked binds,Dion也曾总结过这个问题 可以总结为

– Peeked binds value of dbms_xplan.display_cursor is extracted from v$sql_plan.other_xml

– Captured and Peeked binds will refresh when generating child cursors

– Captured bind value is extraced from X$KQLFBC view.

– Peeked binds and Captured bind are not same thing

另外还可以通过一些手段去tracking bind values :10046,logminer等 下面给个示范:

SQL> oradebug setorapid 41;
Oracle pid: 41, Unix process pid: 30705, image: oracle@db-2-15 (TNS V1-V3)
SQL> 
SQL> 
SQL> oradebug dump errorstack 4;
Statement processed.
SQL> oradebug dump errorstack 4;
Statement processed.
SQL> oradebug tracefile_name;
/home/oracle/diag/rdbms/yhdpri/yhd/trace/yhd_ora_30705.trc

—– Bind Info (kkscoacd) —–
Bind#0
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f3c6bfd2b78 bln=22 avl=03 flg=09
value=1231
Bind#1
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f3c6bfd2ba8 bln=22 avl=04 flg=09
value=21231
Bind#2
oacdty=02 mxl=22(21) mxlc=00 mal=00 scl=00 pre=00
oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f3c6bfd2bd8 bln=22 avl=03 flg=09
value=3455
Frames pfr 0x7f3c6bff26b0 siz=4400 efr 0x7f3c6bff25f8 siz=4384
Cursor frame dump
enxt: 5.0×00000010 enxt: 4.0×00000048 enxt: 3.0×00000078 enxt: 2.0×00000060
enxt: 1.0×00001000
pnxt: 1.0×00000010
kxscphp=0x7f3c6bff0898 siz=984 inu=864 nps=544
kxscdfhp=0x7f3c6bff0778 siz=984 inu=88 nps=0
kxscbhp=0x7f3c6bff0c90 siz=984 inu=184 nps=0
kxscwhp=0x7f3c6bff0db0 siz=18680 inu=8448 nps=0
kxscefhp=0x7f3c6bff0ed0 siz=4056 inu=40 nps=40
Starting SQL statement dump
SQL Information
user_id=0 user_name=SYS module=sqlplus@db-2-15 (TNS V1-V3) action=
sql_id=33xdcqmh004pw plan_hash_value=-190543548 problem_type=4
—– Current SQL Statement for this session (sql_id=33xdcqmh004pw) —–
SELECT COUNT(*) FROM LIU WHERE ID1= :B3 AND ID2= :B2 AND ID3= :B1

Refhttp://kerryosborne.oracle-guy.com/2009/07/creating-test-scripts-with-bind-variables/

some change in cache buffer handle

June 7, 2012 Internals, oracle No comments

今天开发因为连接数不够 强烈要求加大processes值,突然想到了processes对_cursor_db_buffers_pinned的影响,,小记之:

当会话需要pin住buffer header时它首先要获去buffer handle,实例所存在的buffer handle总数由(each process can cache buffer handles*processes决定),也就是说在10g中可以认为 total buffer handle (_db_handles)=5*processes(默认_db_handles_cached为5),而在11g中这个值发生了变化(in 11g _db_handles_cached =8) 而一个process在同一时间内所能pin住的最大handle数由(_cursor_db_buffers_pinned)所决定,我们可以演示一下 _cursor_db_buffers_pinned是怎么算出来的:

SQL> select * from v$version;

BANNER
——————————————————————————–
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production

SQL>

COMPONENT CURRENT_SIZE
————————————————– ———————–
shared pool 8791261184
large pool 16777216
java pool 16777216
streams pool 33554432
DEFAULT buffer cache 59827552256

SQL> show parameter process;

NAME TYPE VALUE
———————————— ———– ——————————
aq_tm_processes integer 0
db_writer_processes integer 6
gcs_server_processes integer 12
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 2100

59827552256/8192/2100=3477.699047619048

NAME VALUE DESCRIPTION
——————————————— ——————– ———————————————————————-
_cursor_db_buffers_pinned 3505 additional number of buffers a cursor can pin at once

可以看到 3478-2 ≈ 3505

_db_handles_cached – default 5
_db_handles – derived: possibly processes * _db_handles_cached
_cursor_db_buffers_pinned – derived: possibly (db_block_buffers / processes) – 2
_session_kept_cursor_pins – may do for cursors what _db_handles does for buffer headers (10g only)

关于这几个参数
_db_handles_cached in 11g -default 8
_session_kept_cursor_pins 在11gR2已经没有了这个参数
11g:
[oracle@testdb ~]$ ora params _session_kept_cursor_pins

no rows selected

10g:
[oracle@testdb2 ~]$ ora params _session_kept_cursor_pins

NAME VALUE DESCRIPTION
——————————————— ——————– ———————————————————————-
_session_kept_cursor_pins 0 Number of cursors pins to keep in a session

关于这块内容刘兄已经解释的很清楚:

“当会话需要pin住buffer header时它首先要获取buffer handle,得到buffer handle的过程中首先要抢占cache buffer handles栓,为了避免对于cache buffer handles栓的过度争用,每个会话被允许cache一小撮buffer handles,也叫保留集(reserved set)。该保留集的上限由隐式参数_db_handles_cached(默认为5)所控制,在此基础上会话在执行不是十分复杂的SQL时不必反复申请栓。
同时存在一些十分复杂的语句,需要同时pin住大量的缓存,此时若存在空闲的handle,会话被允许索要手头5个cached buffer handles以外的handle。也为了限制单个会话的资源占用量,Oracle通过内部算法总是尽量让每个进程能够pin住”合理份额“的buffer,由此单个会话同一时间所能持有的总handle数等于db_block_buffers/processes,隐式参数_cursor_db_buffers_pinned指出了该值。另cache buffer handles并没有子栓,仅有父栓本身,所以如果系统中有大量复杂SQL,会话所能cache的buffer handles远不足以满足需求时,该栓就成为整个系统性能的瓶颈,甚至可能出现潜在的死锁/饿死问题。”




可以参见这篇文章:Know more about Cache Buffer Handle

Oracle latch internals

June 6, 2012 Internals, oracle 4 comments

Oracle latch internals


Server latch implementation called KSL, e.g. every table with name starting with x$ksl… is latch
related and every SGA has ksl… structure as well. In order to get complete list of x$ksl table,
connect with sysdba permissions to oracle and run query:

SELECT kqftanam
FROM x$kqfta
WHERE kqftanam LIKE ‘X$KSL%’

eg:

[oracle@testdb ~]$ ora si

SQL*Plus: Release 11.2.0.3.0 Production on Thu Jun 6 15:37:53 2013

Copyright (c) 1982, 2011, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> SELECT kqftanam
FROM x$kqfta
WHERE kqftanam LIKE ‘X$KSL%’ 2 3 ;

KQFTANAM
——————————
X$KSLLTR
X$KSLHOT
X$KSLLCLASS
X$KSLECLASS
X$KSLEMAP
X$KSLLD
X$KSLED
X$KSLCS
X$KSLSCS
X$KSLES
X$KSLSESHIST

KQFTANAM
——————————
X$KSLEI
X$KSLLW
X$KSLPO
X$KSLWSC
X$KSLWH
X$KSLWT

17 rows selected.

There are two structures associated with a latch in KSL – ksllt and kslla.

latch instance structure
struct ksllt {
union ksllt.kslltlat kslltlat;
eword kslltlvl; /* latch level, to enforce hierarchy */
ub2 kslltnum; /* latch number, to index descriptors */
uword kslltwhr; /* the context from where the latch is held */
ubig_ora kslltwhy; /* why the latch is being acquired */
ub4 kslltwgt; /* count of times gotten wait */
ub4 kslltngt; /* count of times gotten nowait */
eword kslltefd; /* error frame depth when gotten, this is used to */
ub4 kslltwff; /* count of wait gets that failed first try */
ub4 kslltnfa; /* count of times failed nowait get */
ub4 kslltntg; /* count of total gets of latch */
ub4 kslltwts; /* number of waiters */
ub4 ksllttmr; /* down/hold time timer */
kstat kslltwtt; /* wait time for this latch */
kstat ksllthdt; /* hold time for this latch */
kstat kslltdnt; /* down time for this latch */
kstat kslltwtw; /* wait time for associated wait list latch */
ub4 kslltwxs; /* count of X process waiting because of S holders */
ub4 kslltwsx; /* count of S process waiting because of X holder */
ub4 kslltwsw; /* count of S process waiting because of X waiter */
ub4 kslltmxs; /* X misses owing to S holders */
ub4 kslltmsx; /* S misses owing to X holders */
ub4 kslltmxw; /* S misses owing to X waiters */
ub4 kslltwsl; /* count of times slept when failed to get latch */
struct ksllt * kslltchg; /* last child gotten as part of get-parent op */
ub4 kslltwkc; /* count of wakeups that have been done */
ub4 kslltwth; /* count of sleeps that were holding another latch */
ub4 yields_ksllt; /* count of yields */
BitMask for “struct ksllt.yields_ksllt”
KSLHSTMAX BIT 4 // if you change this, must change x$ table

ub4 ksllthst[4];
ub4 gets_wl_ksllt; /* number of gets of wait list latch */
ub4 misses_wl_ksllt; /* number of misses on wait list latch */
ub4 sleeps_wl_ksllt; /* number of sleeps on wait list latch */
ub4 yields_wl_ksllt; /* number of yields on wait list latch */
struct kslla * waitproc_ksllt; /* log change to wait list */
kgglk waiters_ksllt; /* head of waiter list */
kgglk kslltlnk; /* linked list of children */
void * kslltrec; /* recovery info for this latch */
b1 kslltsp1[3]; /* 1 spare 1 byte field */
uword kslltcnm; /* child number */
uword class_ksllt; /* latch class */
skgslt wl_ksllt;
}

struct kslla {
ksllt * ksllalat[( 8 + 7 )]; /* latches owned at each level */
ksllt * ksllalaq; /* latch being acquired */
ksllt * wl_flux_kslla; /* latch whose wait list I’m acquiring */
ksllt * ksllaxbs; /* EXWAIT bit set in shared latch */
ksllt * ksllawat; /* latch being waited for: not protected by any latch, */
ubig_ora ksllawhy; /* context for the latch req */
uword ksllawere; /* location from where the req is being made */
word ksllaevt; /* event waiting for */
uword ksllalow; /* bit array of latches owned, for levels [0..9] */
ubig_ora ksllaevx; /* extension to the event id ksllaevt */
kgglk waiters_kslla; /* link on waiter list */
struct ksllt * ksllawtr; /* latch waiter list this process is on. */
struct ksllt * ksllaspn; /* latch this process is spinning on */
uword ksllawst; /* process wait state (asleep/awake) */
ub4 ksllamd[( 8 + 7 )]; /* For r/w latches, mode in which latch is held */
uword ksllalpe; /* latch post expected */
kgglk ksllapwl; /* Link in post/wait queue the process is on */
kslpwq * ksllapwq; /* post/wait queue the process is on */
sword ksllapwi; /* Index of p/w queue the process was last on */
krmid ksllapid; /* the res ID for the last post received */
krmid ksllasid; /* the res ID for the last post sent */
uword ksllaprv; /* loc ID for the last post received */
uword ksllapsn; /* loc ID (where) for the last post sent */
ub4 ksllapsc; /* count of # of posts sent by the process */
ub4 ksllaprc; /* count of # of posts received by the process */
struct ksupr * ksllapos; /* the last process to post me */
struct ksupr * ksllalpo; /* the last process posted by me */
kgslfx ksllafac; /* context for wait/post facilities */
uword ksllawrk; /* is there work for a process about to be posted */
ub2 ksllasfa; /* service managed wait facility */
BitMask for “struct kslla.ksllasfa”
KSLLASFA_IO BIT 0x0001 // ksfd I/O pending
KSLLASFA_IPC BIT 0x0002 // ksxp I/O pending

}

struct kslld {
text * kslldnam; /* human-readable latch name */
void (*kslldcln); /* _ sword action, ksllt *lp _ cleanup proc*/
void (*kslldgcl); /* _ kgsmp *gp, sword action, struct kgslt *lp _ */
size_t kslldrsz; /* recovery structure size */
b1 kslldlvl; /* latch level */
b1 kslldpnt; /* TRUE if parent of a class of latches */
ub1 kslldlng; /* TRUE if latch can be held for long periods */
b1 kslldg2c; /* parent allowing wait get of 2 children */
uword kslldshl; /* TRUE if latch is shareable */
uword kslldcls; /* latch class */
}

struct ksllt is a definition of the latch itself and all these structures (latches itself and pointers to
hold latches v processes) are visible on instance level via fixed SGA, while kslla is a process
related structure and is visible as part of X$KSUPR(Kernel Services User Process) structure

All available kslla structures can be identified by using the following query:

SELECT c.kqfconam field_name, c.kqfcooff offset , kqfcotyp,
DECODE (kqfcotyp ,
0, ‘hex string’,
1, ‘varchar2’ ,
2, ‘number’ ,
11, ‘word’,
12, ‘datetime’ ,
23, ‘raw’ ,
‘other’
) typ,
kqfcosiz sz
FROM x$kqfco c , x$kqfta t
WHERE t.indx = c.kqfcotab
AND kqftanam = ‘X$KSUPR’
AND kqfconam LIKE ‘KSLLA%’
AND c.kqfcooff > 0
ORDER BY offset ASC

For example, if you are using Oracle 9.2.0.5 you will see the following results

KSLLALAQ ,296 ,0,hex string ,8
KSLLAWAT ,320 ,0,hex string ,8
KSLLAWHY ,328 ,11,word,8
KSLLAWER ,336 ,11,word,4
KSLLALOW ,344 ,0,hex string ,4
KSLLASPN ,384 ,0,hex string ,8
KSLLAPRV ,544 ,11,word,4
KSLLAPSN ,548 ,11,word,4
KSLLAPSC ,552 ,11,word,4
KSLLAPRC ,556 ,11,word,4

Only ksllawat and ksllaspn are actually available via V$PROCESS; ksllawat represents latch waits and ksllaspn represents latch spins. Other fields can be used mostly for debugging purposes such as halt analysis, etc. For example, ksllalaq has pointer ksllt* to acquired latch set by KSLBEGIN, pointing to the head of linked list used in latch wait posting implementation (KSLBEGIN is a begin macro in the defined in the Oracle ksl.h code). Once a process acquires a latch at a certain level, it cannot acquire anymore latch at level that is equal or less than that level (unless it acquires nowait).

To support this issue, another nondocumented column ksllalow from X$KSUPR is used.
To see all the latches in the fixed SGA, the following query can be used:

select k.ksmfsadr , ksmfsnam, ksmfstyp, ksmfssiz, kslldnam, kslldlvl
from x$ksmfsv k , x$kslld a
where k .ksmfstyp like ‘%ksllt%’ and k.ksmfsadr = a.kslldadr
order by ksmfsnam

X$KSLLD : [K]ernel [S]ervice [L]atch [L]ock [D]escriptor
KSLLDLVL : [K]ernel [S]ervice [L]atch [L]ock [D]escriptor [L]atch Le[V]e[L]

eg:

SQL> select * from v$version;

BANNER
——————————————————————————–
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production

SQL> select distinct kslldlvl from x$kslld order by 1;

KSLLDLVL
———-
0
1
2
3
4
5
6
7
8
9
10

KSLLDLVL
———-
11
14
15
16

15 rows selected.

SQL> select * from v$version;

BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi
PL/SQL Release 10.2.0.5.0 – Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0 – Production
NLSRTL Version 10.2.0.5.0 – Production

SQL> select distinct kslldlvl from x$kslld order by 1;

KSLLDLVL
———-
0
1
2
3
4
5
6
7
8
9
11

KSLLDLVL
———-
12
13

13 rows selected.

11g got more levels than 10g but Most latches have a level between 0 and 8

Let’s see how we can lock / unlock enqueues latch ksqeql_ via oradebug. This latch is used to protect operation on KSE (enqueue) related structures in memory. Assum e , value of k.ks mf sadr of this latch th at received from the previous query was 000000038000A0C0 (0x000000038000A0C0 = 15032426688)

In order to lock the latch, we can use function

word kslgetl(ksllt *latch_to_lock , word wait )
sqlplus “/ as sysdba”
oradebug setmypid
SQL> oradebug call kslgetl 15032426688 1

Function returned the value 1, meaning that we locked the latch.

Now let’s try to connect to Oracle. You can see that your session was halt because you are holding enqueue latch, therefore Oracle is unable to update even its own X$ table entries. Let’s return to oradebug and will release (free) the enqueue latch.

SQL> oradebug call kslfre 15032426688Function returned 8000A0C0

This time yours another session continued immediately.
Now let’s check the enqueu latch by queriynig v$latch:

select wait_time
from v$latch
where name = ‘enqueues’

The wait time returned is extremely big (in exam ple it was 619714031, e.g. 10 m i nutes).
List of all latch related tables as following:

GV$LATCH
GV$LATCH_CHILDREN
GV$LATCH_PARENT
GV$DLM_LATCH

*Note: list excludes x$kslld which is process internal structure and not represented in the SGA at
whole.

GV$DLM_LATCH is a special fixed table, used in the OPS/RAC environment only to monitor distributed lock manager latch. This fixed view exists only for historical reasons, since DLM did not used KSL ksllt and kslla latch related structures prior to version 8.1.5. DLM had own kjsllt and kjslla structures. DLM uses standard KSL structures from version 8.1.5 and up, therefore DLM latches can be monitored via V$LATCH fixed views.
GV$LATCHHOLDER is a process, e.g. X$KSUPR, fixed array of process structures,
GV$LATCH_MISSES is table which is non-directly points to the latch structures.

The next interesting question is where in the Oracle code each latch is used. It is possible to see the latch contention on some latch. However its name would be meaningless so at least you can identify where in oracle code (up to function name) either this latch has been locked. Such identification can be done, in normal case by running the following query:

select parent_name, location, nwfail_count, sleep_count from v$latch_misses;

where column location divided to 1 – 3 parts divided by ‘:’
1) Oracle kernel function.
2) Optional kslbegin (macro to lock latch) or operation name.
3) Optional description or comment if single function has several locks/unlocks for the same
latch.

For example, let’s look at the function “kcbgtcr” used to “get cache buffer for consistent read” or in other words it reads buffer from disk to buffer cache and then roll it back up to first SCN of the query, caused CR read.

In Oracle 9.2.0.5 this function has 4 different places where “cache buffers lru chain” latch to which the appropriate related block can be locked. Execute the following query:

SELECT t1 .ksllasnam “parent_name” , t2 .ksllwnam “location” ,
t2.ksllwlbl “unit to guard”
FROM x$ksllw t2, x$kslwsc t1
WHERE t2 .indx = t1 .indx
AND t2 .ksllwnam LIKE ‘kcbgtcr%’
AND ksllasnam = ‘cache buffers lru chain’

Result of query above will be something like:

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –
parent_name location unit to guard
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –
cache buffers lru chain kc bgtcr:CR Scan:KCBRSTOP buffer header
cache buffers lru chain kc bgtcr:CR Scan:KCBRSAVE buffer header
cache buffers lru chain kc bgtcr:CR Scan:KCBRSKIP buffer header
cache buffers lru chain kc bgtcr:CR Scan:best buffer header
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —


*Note: In next topic about buffer cache and buffer cache related latches in the future discussion we will talk about CR blocks and related parameters.

PMON invokes a clean up function for releasing latches if the holding process dies.PMON will initiate the clean up function.

for 10g:

NAME VALUE DESCRIPTION
——————————————— ——————– ———————————————————————-
_max_exponential_sleep 0 max sleep during exponential backoff
_max_sleep_holding_latch 4 max time to sleep while holding a latch
_spin_count 2000 Amount to spin waiting for a latch

The sleep time on latches approximately doubles with each wait until upper limit is reached. This is an exponential backoff.

If Oracle instance is halt and you have some reason to think that is caused by latching problem, then you can use oradebug to dump latch statistics:

1. connect as sysdba
2. In order to dump latch states, perform the following operations:

? oradebug setmypid
? oradebug dump latches 2
? oradebug tracefile_name
Trace file will be generated with name, for example such as /oravl01/oracle/adm/bigsun/udump/bigsun_ora_21039.trc

Open this trace file to see the latch with high and constantly increasing between dumps sleeps count in the case of willing-to-wait latch and failed count in the case of no-wait latch.

For example, if you performed 2 dumps with insignificant interval between and have seen in the first dump for some child redo allocation latch sleeps count of 3. In the second dump sleeps count specific cache buffers lru chain latch is guard for specific part of buffer lru chain and block we are going to use has entry in this hash chain of 13 with “failed first” increased to the same number as sleeps count and “gotten” counts remains the same. All this means that some (and at least one) process is waited constantly for this latch all
the time between two dumps, e.g. another process hold this latch.

Example:


dump (1):

396670178 Child redo allocation level=5 child#=1
Location from where latch is held : kcrfwi : more space: strand #
Context saved from call : 0
state=free
gotten 7776125 times wait, failed first 355 sleeps 3
gotten 0 times nowait, failed : 0


dump (2):

396670178 Child redo allocation level=5 child#=1
Location from where latch is held : kcrfwi : more space: strand #
Context saved from call : 0
state=free
gotten 7776125 times wait, failed first 365 sleeps 13
gotten 0 times nowait, failed : 0

Let’s try and emulate such “bug” example. To do that, we will need two sessions namely sess_1 and sess_2. Our child redo allocation latch address is 0x396670178 that can be converted into 15408234872. sess_1 will be connected as sysdba and sess_2 as oracle dba:

step 1. sess_1: oradebug setmypid
step 2. sess_1: oradebug dump latches 2
step 3. sess_1: oradebug call kslgetl 15408234872 1
step 4. sess_2: create table test1 as select * from dba_objects;
step 5. wait 1 minute
step 6. sess_1: oradebug call kslfre 15408234872
step 7. sess_1: oradebug dump latches 2

On the step 4 session sess_2 was halted. If you will try performing dump latches while some latch is locked, you will have a good chance to receive well known ORA-03113 (end-of-file on communication channel). However this is relates to latch and platform dependent.

Due to the error stacks it is looks like this is Oracle bug since latch state represented in the latch dump. However during all tests we have performed, we never seen in the dump files states other then “state=free”. It’s seems that Oracle waits several seconds until each latch will be freed and then dumps its state. If latch has not been freed during several seconds, ORA-03113 may occur…

396670178 Child redo allocation level=5 child#=1
Location from where latch is held : kcrfwi : more space: strand #
Context saved from call : 0
state=free
gotten 7784997 times wait, failed first 362 sleeps 10
gotten 0 times nowait, failed : 0


396670178 Child redo allocation level=5 child#=1
Location from where latch is held : kcrfwi: before write: strand #
Context saved from call : 0
state =free
gotten 7786065 times wait, failed first 367 sleeps 14
gotten 0 times nowait, failed: 0

In this case “gotten” increased significantly since to perform following statement:

create table test1 as select * from dba_objects;

more then 1000 times redo the space allocations has been performed.

In the general case, even if your instance is not halt but you want to see latch statistics in this format, you can dump latches statistics by running

ALTER SESSION SET EVENTS ‘immediate trace name latches level 2’;

Dump latches have two available levels:
? level 1 when dump just basic latch information (without statistics)
? level 2 when dump just latch information with statistics

and you can also user event 10005 before 9i

eg:

event = “10005 trace name context forever, level 1”