oracle

关于Partial write for all MySQL and Oracle

September 2, 2014 Internals, MYSQL, oracle No comments

很久之前看过这个帖子,当时没有细究,这几天看了下mysql相关的internal 突然想起这个问题,跟三四个朋友讨论了下 想把这个问题展开讲一讲。

源帖子内容见:https://community.oracle.com/thread/1087650?start=0&tstart=0

这个帖子的内容为一个老外问oracle如何避免partial write的,也就是大家所说的半个IO问题,IO 在写到一半的时候发生断裂,(瞬间断线,瞬间crash瞬间存储或者机器故障等) 虽然这个情况极少发生,但是在非常规恢复的时候也是经常遇到的,例如某个block内部出现不一致,也就是我们所谓的逻辑坏块。
我仔细的又过了一遍帖子,发现下面的几个回答完全不准确,或者有点张冠李戴的意思。

首先我们看看什么是mysql 的double write:

—–引自 mysqlperformace blog

为什么要用doublewrite?
目的是为了保证出现部分写失效(partial page write)–即数据页写到一半时就出现故障–时的数据安全性。Innodb并不在日志中记录整个数据页,而是使用一种称之为“生理”日志的技术,即日志项中只包含页号、对数据进行的操作(如更新一行记录)和日志序列号等信息。这一日志结构的优点是能够减少写入到日志的数据量,但这也要求要保持页内部的一致性。页的版本号是没关系的,数据页可以是当前版本(这时Innodb(故障恢复时)会跳过对页的更新操作)若是早期版本(这时Innodb将执行更新操作)。但如果页内部数据不一致,恢复将不能进行。

部分写失效
什么是部分写失效,为什么会出现这一现象呢?部分写失效指的是提交给操作系统的写数据页操作只完成了一部分。例如一个16K的Innodb数据页中只有4K被更新了,其余部分数据还是旧的。大部分的部分写失效都在断电时产生,在操作系统崩溃时也可能出现,这是由于操作系统可能将一个写16K数据的操作分割成多个写操作(这可能由文件碎片导致),而在执行这多个写操作的过程中发出的故障。当使用软件RAID技术时,数据页可能恰好跨越分片(stripe)的边界,这时也需要执行多个IO操作,因此也可能导致部分写失效。当使用硬件RAID又没有使用带电池的缓存时,断电时出现部分写失效也是可能的。当发送到磁盘本身只有一个写操作时,理论上硬件可以保证写操作即使在断电时也可以完成,因为驱动器内部应该积累有足够的电量来完成这一操作。但实话说我并不知道事实是不是这样,这很难检测,而且这也不是部分写失效的唯一原因。我所知道的只是部分写失效可能会出现,当Innodb实现 doublewrite功能前,我遇到过很多由于这一原因而导致数据被破坏。

doublewrite如何工作?
你可以将doublewrite看作是在Innodb表空间内部分配的一个短期的日志文件,这一日志文件包含100个数据页。Innodb在写出缓冲区中的数据页时采用的是一次写多个页的方式,这样多个页就可以先顺序写入到doublewrite缓冲区并调用fsync()保证这些数据被写出到磁盘,然后数据页才被定出到它们实际的存储位置并再次调用fsync()。故障恢复时Innodb检查doublewrite缓冲区与数据页原存储位置的内容,若数据页在doublewrite缓冲区中处于不一致状态将被简单的丢弃,若在原存储位置中不一致则从doublewrite缓冲区中还原。

doublewrite缓冲区对MySQL有何影响?
虽然doublewrite要求每个数据页都要被写二次,其性能开销远远小于二倍。写出到doublewrite缓冲区时是顺序写,因此开销很小。 doublewrite同时还可以降低Innodb执行的fsync()操作,即不需要写每个页时都调用一下fsync(),而可以提交多个写操作最后再调用一次fsync()操作,这使得操作系统可以优化写操作的执行顺序及并行使用多个存储设备。但在不使用doublewrite技术时也可以用这些优化,事实上这些优化是与doublewrite同时实现的。因此总体来说,我预计使用doublewrite技术带来的性能开销不会超过5%到10%。

能否禁用doublewrite?
如果你不关心数据一致性(比如使用了RAID0)或文件系统可以保证不会出现部分写失效,你可以通过将innodb_doublewrite参数设置为0还禁用doublewrite。但通常这可能带来更大的麻烦。

这里暂且不讨论为何mysql不开启double write会容易出现断裂的写操作. 在mysql中数据写的单元是以page为单位 1page=16KB 而在oracle中是以block为单位 block可以指定大小。但是写入OS的时候都是以OS block为单位,也就是说如果写入OS block时发生partial write 同样会出现逻辑问题。

这里我们看一个老外的回答:

It’s an interesting question. I think the clue is in the link you provided: “Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent.”

What that’s saying (I think!) is that the contents of the innodb transaction log can only be replayed to datafile pages which are ‘clean’ -and that’s true for Oracle, too. You can’t apply Oracle redo to an Oracle database block that is internally corrupted because some of its consituent “os pages” were written at a different time from others. When such partial writes happen, you get what’s called a “fractured block”, warnings in the alert log …and the data file is regarded as corrupt from that point on.

Oracle’s fix to this potential problem, however, is also hinted at in the article you linked to: “Innodb does not log full pages to the log files”. That’s an interesting sentence because , you see, Oracle does write full pages to the logs! I should immediately qualify that: it only does so when you take a “hot backup” using O/S copy commands -because it’s only then that you have to worry about the problem. In other words, you only have to worry about the fact that you can only apply redo to an internally consistent database block if you’re actually in the business of applying redo… and you’re only doing that in the event of a recovery. And complete recoveries in Oracle (as opposed to mere crash recoveries) require you to have restored something from backup. So, it’s only during the backup process that you only have to worry about the problem of fractured blocks -and so it’s only then that Oracle says, ‘if you have put the tablespace into hot backup mode (alter tablespace X begin backup), then the first time a block of data is changed, the entire block should be written in a consistent state into the redo (transaction) logs. Then, if the datafile copy of the block in the backup turns out to be fractured, we’ve got a known good copy in the redo we can restore in its place. And once you have a clean block as a starting point, you can continue to apply redo from that point on’.

Oracle has an alternative (and more up to date) mechanism for achieving this “I know your data block is clean” starting state, though. It’s called RMAN -the Oracle backup and recovery tool. Unlike your OS copy command, it’s an Oracle utility… so it understands the concept of Oracle blocks, and it can therefore check that a block that’s been copied has been copied consistently, with all its constituent OS ‘pages’ written coherently to disk in the same state. It knows how to compare input and output in a way no OS command could ever hope to do. So when RMAN copies a data block hot, it reads the copy, compares it with the original -and if it sees the copy is fractured, it just has another go copying the block again. Repeat until the copy is indeed verified as a good copy of the original. No need to write the block into the transaction log at all, because you know that the backup file itself contains the necessary clean block copy.

So, putting that into practice. Let’s say your server corrupts data on the disk for whatever reason and, in the process, your Oracle instance dies. You try and restart Oracle, but you get told that recovery is needed (you might get a message that file 16, for example, can’t be read). So you restore file 16 from your hot backup taken with OS commands. In that backup, one of the blocks is fractured, because only part of the Oracle block had hit disk at the point the backup was performed. So you restore a fractured block. But that’s not a problem, because as redo is replayed, you’ll find the clean copy of the block in the redo stream, and restore that over the top of the fractured block. The rest of the redo can then be replayed without a problem. Or, you restore file 16 using RMAN… and what it restores cannot be fractured, because it checks for that before it reports the original backup a success. Therefore, you restore a clean copy of file 16, and can apply redo to it without drama. Either way, you get your database recovered.

So, the article you linked to nails the important matter: “It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.” Absolutely true of Oracle, too. But Oracle has two alternatives for ensuring that a clean version of the block is always available: write a whole block into redo if it’s changed whilst the database is being backed up with OS commands, or make sure you only write clean blocks into the backup if you’re using RMAN -and you achieve that by multiple reads of the block, as many as are necessary to ensure the output is clean.

Oracle’s solutions in these regards are, I think, a lot more efficient than double-writing every block all the time, because the only time you have to worry that what’s on disk isn’t consistent is, as your linked article again points out, when ‘power failure’ or ‘os crash’ happens. That is, during some sort of failure. And the response to failure that involves corruption is always to restore something from backup… so, it’s really only that backup that needs to worry about ‘clean pages’. Instead of writing everything twice to disk during normal running (which sounds like a potentially enormous overhead to me!), therefore, Oracle only has to employ protective measures during the backup process itself (which should, ordinarily, be a mere fraction of ‘normal running’ time). The overhead is therefore only encountered sparingly and not something you need worry about as a potentially-constant performance problem.

In closing, I’ll second Aman’s observation that it is generally and usually the case that any variation away from the default 8K block size is a bad idea. Not always, and there may be justification for it in extremis… but you will certainly be at risk of encountering more and weirder bugs than if you stick to the defaults.

这一段回答可谓一点儿也没讲到重点 主要回答oracle 采用避免partial write的几个方法,注意 这里是在特殊场景下 如何避免partial write.而不是数据库机理. 我们看一下mysql怎么同样避免这个问题 —> ( oracle hot backup/RMAN backup)
传统Oracle的热备,备份读取与DBWR写文件并行执行,因此可能读取到Fractured Block(不一致的块),解决办法是对于备份的文件,DBWR写的Dirty Page同时写入Redo Log,用于Fractured Block的恢复。RMAN备份,会检查每个读取页面的一致性,不一致就重新读取。Percona的XtraBackup,采用类似于RMAN的方案。
如何检查备份读取的页面是否是一致的,其实方法很简单:无论是Oracle/InnoDB的页面,都会在页面的页头与页尾存储相同的SCN /CheckSum。当页头/页尾的SCN/CheckSum相同,说明页面处于一致状态,否则就说明页面损坏(Fractured Block),备份重新读取损坏页面即可。
所以这一段可以理解为当你需要备份的时候 如何避免partial write 因为在这个时候最容易发生断裂的块或者页。而此前别人提问的是oracle是否有类似double write的手法避免常规的partial write.

我们继续看下面的回答:

user646745 wrote:
Thanks HJR for detailed analysis.

But the double write mechanism works without restore from a backup an apply redo: before write the dirty buffer blocks, innodb flush the blocks in the double write buffer to disk which is a sequential disk area (so it’s fast),
————————-
before write the dirty buffer blocks, logwr flush the blocks in the redo buffer to disk which is a sequential disk area (so it’s fast),

so even if partial write happen, the the blocks in double write buffer already persistent in disk,
————————-
so even if partial write happen, the the blocks in redo buffer already persistent in disk,

and when mysql restart, innodb can compare the blocks flushed from double write buffer with the datafile blocks,
————————-
and when mysql restart, smon can compare control file scn with the datafile blocks scn,

if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just overwrite the datafile blocks with the blocks from double write buffer.
————————-
if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just apply the redo from the redo logs.

So no media recover is required.
————————-
sounds like media recovery to me

Based on your anaysis, oracle needs media recover.
————————-
Based on your analysis, so does mysql. It just applies it in a very slightly different fashion, and calls it all something else.

这里的回答又有误点. 他说道 “so even if partial write happen, the the blocks in double write redo buffer already persistent in disk” 这句话存在明显的误导,首先partial write发生的时候,redo是无法恢复
一个块内不一致的块的,redo只能负责recover作用,但这不是针对块内部的恢复 而是trasaction或者media的.

oracle recovery 分为 instance/crash recovery and media recovery 其本质区别在于instance recovery 需要online redo logfile 即apply 增量检查点之后的redolog. 而media recovery又分为 datafile /block media
recovery 其本质为恢复一个旧的datafile或者block 可能需要用到归档日志。注意这里的前提为需要有备份。restore datafile /block 只能在备份中恢复,而之后的recover是要用到archive log。 这里的media recover可以
对应为解决partial write的方法。

对于一个内部不一致的page mysql无法通过redo去恢复 :redo有效应用的前提是data的一致性,当data flush到磁盘时发生故障,比如16K只写了4K并且redo条目是change vector形式,属于逻辑记录,那么在InnoDB recovery时,redo就不知道从哪开始跑起。

对于oracle来说 内部不一致的block.仅仅有 redo也是不够的.我们可以通过dbf检查到逻辑问题的block,去repair这个block,前提是需要有备份。这里用到的方法是media recovery里的block recovery.
我们可以使用bbed 或者一些OS的工具来模拟这些逻辑坏块,并且往往发现数据库是可以正常open的,只有在访问到这些block的时候才会出现逻辑问题。所以说备份重于一切,oracle在处理这些”半个IO”或者一些静默丢失(storage),在没有备份的情况下也是很乏力的。

小型机迁移x86平台之老生常谈

July 17, 2014 migration, oracle, system, unix No comments

You can view this presentation click here: x86_architect
注意使用chrome打开文件 不然可能乱码

[SHOUG] 1号店Exadata应用

April 3, 2014 Architect, Exadata, hardware, oracle No comments

Download PDF: [SHOUG.LOUISLIU]Exadata在电商的实践

Oracle DGHA architecture

November 1, 2013 advanced tech, Architect, network, oracle, replication No comments



Download this PDF

[转] Autotrace with bind variables

September 27, 2013 Internals, oracle No comments

JL刚刚写的一篇autotrace与bind variables 的行为:Autotrace trick
之前我们也发现过这个问题具体SR在这里:something seems confused

CTAS导致严重BUG一则

September 25, 2013 Internals, oracle No comments

某国内巨头的一起CASE,在系统繁忙情况下大量的CTAS 导致数据字典表的不一致,由原来10.0.2.4版本升级到10.0.2.5之后问题依旧。
具体表现为:
1. 尝试drop这些不一致的tablespace

15:00:18 SQL> @dropts_2012.0319.1500.sql
15:00:18 SQL> drop tablespace ODSCP1003 including contents;
drop tablespace ODSCP1003 including contents
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [ktssdrp1], [11], [0], [0], [], [],
[], []

Elapsed: 00:01:52.44
15:02:11 SQL> drop tablespace ODSCP1004 including contents;
drop tablespace ODSCP1004 including contents
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [ktssdrp1], [12], [0], [0], [], [],
[], []
Elapsed: 00:00:03.51

2.TAB$,seg$数据的不一致 hcheck检查结果

3.无法shutdown database

BUG: Various ORA-600 errors / dictionary inconsistency from CTAS / DROP

目前处理方案为: 针对这个库的大小接近30TB,建议整理出问题的TS,将这些TS的数据整理到新的TS中,将没有问题的TS TTS到新的DB中,再将这个DB升级到11g,由于oracle已经对10g的DB不再提供support(需要购买extended 服务),建议对一些残留问题的10gDB尽快提上升级议程。另外针对本库的ODS应用,不建议采用32K blocksize,11gR2下的32k blocksize依然存在很多问题:

Bug 7605618
Bug 11799496
Bug 12680822

Think of bind value

September 17, 2013 Internals, oracle No comments

一次诡异的缓慢”死亡” CASE

September 13, 2013 Internals, oracle No comments

今天下午遇到的一个诡异CASE,现象为系统诡异的死亡,目前判定为sqluldr2工具导致 (同事使用sqluldr2 用具 执行了sqluldr2_linux64_10204.bin -help 命令)
抛出的错误如下 (/var/log/message)

Sep 11 09:00:54 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:07:08 db18 last message repeated 2 times
Sep 11 09:08:47 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:09:54 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:10:07 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:10:46 db18 kernel: tnslsnr[19787]: segfault at 0000000000000000 rip 00002b92c7d5bebe rsp 00007fff0a1fea90 error 4
Sep 11 10:49:59 db18 auditd[8397]: Audit daemon rotating log files
Sep 11 10:53:41 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 10:54:23 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 12:50:44 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 12:54:37 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 13:56:56 db18 nrpe[15359]: Host 10.4.0.238 is not allowed to talk to us!
Sep 11 13:57:08 db18 nrpe[15425]: Host 10.4.0.238 is not allowed to talk to us!
Sep 11 14:17:49 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808

QQ图片20130913002200

当时系统的SELinux为enable,阻止了sqluldr2对于oracle库文件的操作,多次尝试之后在09:10:46系统出现了异常,当时仅仅表现为listener的异常关闭,自此系统开始了”慢性”死忙

Last login: Tue Sep 10 14:59:41 2013 from 10.4.0.232
[oracle@db18 ~]$ lsnrctl status

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 11-SEP-2013 09:18:47

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=0.0.0.0)(PORT=1522)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused
[oracle@db18 ~]$ 
[oracle@db18 ~]$ 
[oracle@db18 ~]$ ora active
09:18:54 up 702 days, 14:57,  2 users,  load average: 3.06, 2.08, 0.93

no rows selected

[oracle@db18 ~]$ lsnrctl start

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 11-SEP-2013 09:19:16

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /home/oracle/product/10205/db1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 10.2.0.5.0 - Production
System parameter file is /home/oracle/product/10205/db1/network/admin/listener.ora
Log messages written to /home/oracle/product/10205/db1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1522)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=0.0.0.0)(PORT=1522)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.5.0 - Production
Start Date                11-SEP-2013 09:19:16
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /home/oracle/product/10205/db1/network/admin/listener.ora
Listener Log File         /home/oracle/product/10205/db1/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1522)))
Services Summary...
Service "misc" has 1 instance(s).
  Instance "misc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@db18 ~]$ 
[oracle@db18 ~]$ ora active   
 09:19:22 up 702 days, 14:57,  2 users,  load average: 3.09, 2.18, 1.00

12号下午出现的问题,很多process占用大量的CPU资源,包括background process,多个进程处于running假死阶段,sys占用大量系统资源。

QQ图片20130913000507

TBQTS6(34${(BX$GLS~@PVP

database alert log如下:

Errors in file /data/oracle/admin/misc/udump/misc_ora_28401.trc:
ORA-27300: OS system dependent operation:invalid_process_id failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgpalive1

分析问题,查看ora background process进程状态 发现如下异常

[oracle@db18 arch]$ ps -ef | grep -i ora_
oracle   24019     1  0 May18 ?        01:49:30 ora_pmon_misc
oracle   24023     1  0 May18 ?        00:33:02 ora_psp0_misc
oracle   24027     1  0 May18 ?        00:34:32 ora_mman_misc
oracle   24031     1  0 May18 ?        00:59:23 ora_dbw0_misc
oracle   24033     1  0 May18 ?        01:02:13 ora_dbw1_misc
oracle   24035     1  0 May18 ?        00:58:57 ora_dbw2_misc
oracle   24037     1  0 May18 ?        00:59:24 ora_dbw3_misc
oracle   24039     1  0 May18 ?        00:59:06 ora_dbw4_misc
oracle   24043     1  0 May18 ?        00:59:35 ora_dbw5_misc
oracle   24047     1  0 May18 ?        04:28:36 ora_lgwr_misc
oracle   24049     1  0 May18 ?        09:41:15 ora_ckpt_misc
oracle   24053     1  0 May18 ?        00:36:44 ora_smon_misc
oracle   24057     1  0 May18 ?        00:28:39 ora_reco_misc
oracle   24061     1  1 May18 ?        1-06:42:09 ora_mmon_misc
oracle   24065     1  0 May18 ?        22:12:24 ora_mmnl_misc
oracle   24347     1  0 May18 ?        00:37:10 ora_arc0_misc
oracle   24351     1  0 May18 ?        00:43:10 ora_arc1_misc
oracle   24355     1  0 May18 ?        01:42:59 ora_lns1_misc
oracle   24543     1  0 May18 ?        00:28:31 ora_qmnc_misc
oracle   24780     1  0 May18 ?        00:10:05 ora_q000_misc
oracle   24969     1  0 May18 ?        00:09:25 ora_q001_misc
oracle   26077 25306  0 15:31 pts/8    00:00:00 tail -f misc_ora_21763.trc
oracle   27634 22109  0 15:44 pts/6    00:00:00 grep -i ora_
oracle   27867 24023 98 Sep11 ?        1-06:03:47 ora_psp0_misc   ------------刷新时间为Sep11
oracle   27868 27867 98 Sep11 ?        1-06:03:38 ora_psp0_misc   ------------为衍生进程

PSP进程出现异常状态,对于PSP0进程的解释如下:

PSP stands for Process SPawner and this process has the job of creating and managing other Oracle processes.
The initial set of background processes are spawned by the startup process. After the instance is started, PSP0 is responsible for spawning any
required background processes.

当时系统的状态为无法checkpoint 无法 syn log ,包括alter database set 等命令也出现hang死状态

[oracle@db18 ~]$ ora active
15:57:03 up 703 days, 21:35, 12 users,  load average: 24.11, 26.99, 50.27

   SID USERNAME   MACHINE          EVENT                      PARAM                W   WT SQL                        ST     LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
  3601 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006158/0       0    0 /frw83qw2fvsa1             K        0    1088732
  3607 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006175/0       0    0 /frw83qw2fvsa1             K        0        3857
  3611 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006137/0       0    0 /frw83qw2fvsa1             K        0    1088741
  3616 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006169/0       0    0 /frw83qw2fvsa1             K        0        3895
  3634 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006140/0       0    0 /frw83qw2fvsa1             K        0    1088735
  3635 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006173/0       0    0 /frw83qw2fvsa1             K        0        3889
  3646 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006209/0       0    0 /frw83qw2fvsa1             A        0        1023
  3650 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006174/0       0    0 /frw83qw2fvsa1             K        0        3882
  3661 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006152/0       0    0 /frw83qw2fvsa1             K        0        5346
  3664 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006138/0       0    0 /frw83qw2fvsa1             K        0      877657
  3668 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006200/0       0    0 /frw83qw2fvsa1             A        0        1027
  3671 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006198/0       0    0 /frw83qw2fvsa1             A        0        1027
  3676 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006184/0       0    0 /6p07vqnwqrsyw            A        0        1032
  3691 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006210/0       0    0 /frw83qw2fvsa1             A        0        1011
  3696 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006141/0       0    0 /frw83qw2fvsa1             K        0        5346
  3698 SYS     sqlplus@db18         reliable message            70088494424/70158516   0    0 /                         A           0         390

表现为ckpt lgwr dbwn进程全部死亡 .. oradebug dump systemstat 速度极其缓慢 oradebug dump errorstack hang —- sys 占用大量资源

awk -f ass109.awk systemdump.txt > systemdump.log —并没有发现任何异常等待以及blocker信息

[oracle@db18 ~]$ tail -f -n 50 systemdump.log 
     Cmd: Select
296:waiting for 'SQL*Net message from client' 
     Cmd: Select
297:waiting for 'SQL*Net message from client' 
     Cmd: Select
298:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
299:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
300:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
302:waiting for 'SQL*Net message from client' 
     Cmd: Insert
444:waiting for 'SQL*Net message from client' 
     Cmd: Select
532:waiting for 'SQL*Net message from client' 
     Cmd: Select
536:waiting for 'SQL*Net message from client' 
     Cmd: Select
538:waiting for 'SQL*Net message from client' 
     Cmd: Select

NO BLOCKING PROCESSES FOUND

System State 2
~~~~~~~~~~~~~~~~
1:                                      
2:  waiting for 'pmon timer'            
3:  waiting for 'rdbms ipc message'     
4:  last wait for 'rdbms ipc message'   
5:  waiting for 'rdbms ipc message'     
6:  waiting for 'rdbms ipc message'     
7:  waiting for 'rdbms ipc message'     
8:  waiting for 'rdbms ipc message'     
9:  waiting for 'rdbms ipc message'     
10: waiting for 'rdbms ipc message'     
11: waiting for 'rdbms ipc message'     
12: waiting for 'rdbms ipc message'     
13: waiting for 'rdbms ipc message'     
14: waiting for 'smon timer'            
15: last wait for 'os thread startup'   
16: last wait for 'rdbms ipc message'   
17: waiting for 'SQL*Net message from client' 
     Cmd: Select
18:                                     

NO BLOCKING PROCESSES FOUND


171760 Lines Processed.

其现象为系统的一个”缓慢”的死亡行为 (通过SAR 定位到一个折点时间)

sar -f /var/log/sa/sar11   ---- sep 11

09:00:01 AM       all      0.01      0.00      0.01      0.00      0.00     99.97
09:10:01 AM       all      0.03      0.00      0.02      0.00      0.00     99.95   ----->   (09:10:46 db18 kernel: tnslsnr[19787]: segfault at 0000000000000000 rip 00002b92c7d5bebe rsp 00007fff0a1fea90 error 4)
09:20:01 AM       all      2.18      0.00      2.16      0.00      0.00     95.66
09:30:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.71
09:40:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.72
09:50:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.72

03:00:01 PM       all      3.99      0.00      4.39      0.00      0.00     91.62
03:10:01 PM       all      4.04      0.00      4.32      0.00      0.00     91.64
03:20:01 PM       all      4.47      0.00      4.87      0.00      0.00     90.66
03:30:01 PM       all      4.98      0.00      5.47      0.00      0.00     89.55
03:40:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
03:50:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
04:00:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
04:10:01 PM       all      4.94      0.00      5.50      0.00      0.00     89.56
04:20:01 PM       all      4.82      0.00      5.62      0.00      0.00     89.56


10:20:01 PM       all     10.72      0.00     12.22      0.00      0.00     77.06
10:30:01 PM       all     10.73      0.00     12.21      0.00      0.00     77.06
10:40:01 PM       all     10.69      0.00     12.25      0.00      0.00     77.06
10:50:01 PM       all     10.67      0.00     12.27      0.00      0.00     77.06
11:00:01 PM       all     10.52      0.00     12.41      0.00      0.00     77.06
11:10:01 PM       all     10.42      0.00     12.52      0.00      0.00     77.06
11:20:01 PM       all     10.38      0.00     12.56      0.00      0.00     77.06


sar -f /var/log/sa/sar12   ---- sep 12

12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM       all     11.23      0.00     11.72      0.10      0.00     76.95
12:20:02 AM       all     11.29      0.00     11.65      0.10      0.00     76.96
12:30:01 AM       all     11.30      0.00     11.63      0.09      0.00     76.96
12:40:01 AM       all     11.71      0.00     12.36      0.10      0.00     75.83
12:50:01 AM       all     12.14      0.00     12.88      0.10      0.00     74.87
01:00:01 AM       all     12.08      0.00     12.94      0.10      0.00     74.88
01:10:01 AM       all     12.24      0.00     12.79      0.10      0.00     74.88
01:20:01 AM       all     12.25      0.00     12.77      0.10      0.00     74.88

10:00:01 AM       all     12.08      0.00     37.94      0.10      0.00     49.88
10:10:01 AM       all     12.07      0.00     37.95      0.09      0.00     49.88
10:20:01 AM       all     12.09      0.00     37.94      0.11      0.00     49.87
10:30:01 AM       all     12.10      0.00     37.93      0.10      0.00     49.88
10:40:01 AM       all     12.02      0.00     38.00      0.11      0.00     49.87
10:50:01 AM       all     12.06      0.00     37.96      0.10      0.00     49.88
11:00:01 AM       all     12.37      0.00     37.65      0.10      0.00     49.88
11:10:01 AM       all     12.32      0.00     37.70      0.10      0.00     49.88
11:20:01 AM       all     12.50      0.00     37.52      0.10      0.00     49.88
11:30:01 AM       all     12.59      0.00     37.43      0.10      0.00     49.88
11:40:01 AM       all     12.77      0.00     37.25      0.11      0.00     49.87

01:50:01 PM       all     13.41      0.00     40.77      0.10      0.00     45.71
02:00:01 PM       all     13.41      0.00     40.78      0.11      0.00     45.71
02:10:01 PM       all     13.41      0.00     40.78      0.10      0.00     45.71
02:20:01 PM       all     13.17      0.00     63.22      0.05      0.00     23.55
02:30:02 PM       all     13.21      0.00     86.78      0.00      0.00      0.00
02:40:01 PM       all     13.29      0.00     86.71      0.00      0.00      0.00
02:50:01 PM       all     13.42      0.00     86.58      0.00      0.00      0.00
03:00:01 PM       all     13.58      0.00     86.42      0.00      0.00      0.00
03:10:01 PM       all     13.38      0.00     86.62      0.00      0.00      0.00
03:20:01 PM       all     13.25      0.00     86.75      0.00      0.00      0.00
03:30:01 PM       all     13.19      0.00     86.80      0.00      0.00      0.00
03:40:01 PM       all     12.95      0.00     86.25      0.00      0.00      0.79

假想:sqluldr2 与 Selinux的冲突导致了oracle so文件的异常,表现为segfault–listener的异常关闭,进而系统缓慢的”死亡” (建议不要在生产系统直接使用sqluldr2而使用TNS远程连接)

DGHA,Dataguard自动切换方案

August 15, 2013 Architect, oracle, replication, software No comments

发布对autoswitch over dg脚本DGHA测试,此脚本针对oracle dataguard设计使用共享存储存放redo以及controlfile从而达到了切换数据0丢失。
同时切换以后,original primary database可以无缝加入整个dg集群,从而形成了dataguard高可用方案。
采用perl脚本定制开发,可任意部署于任何一台可连接至集群的机器(最好不要用集群中的机器做监控机),多重判断机制,从而增加了切换时的可靠性以及准确性。感谢作者 noodba. noodba作为部门数据架构师,定制开发了myawr(mysql性能分析工具)

测试环境:

10.0.0.63 (primary database)
10.0.0.24 (standby database)
10.0.0.13 (standby database)
切换site指定为63<->24 13作为恒定standby存在

Step 1. 使用swingbench作为压测工具 (导入10G 数据)

QQ图片20130815220940

[oracle@userdb-pre admin]$ ora active
 17:23:15 up 1 day,  1:15,  5 users,  load average: 2.99, 0.97, 0.34

   SID USERNAME   MACHINE		 EVENT				PARAM			W   WT SQL			    ST	   LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
  1089 TEST	  JDBC Thin Client	 SQL*Net more data from client	1413697536/4/0	       -1    0 /5gkc10wnagprk		    A	    0	     105
   961 TEST	  JDBC Thin Client	 SQL*Net more data from client	1413697536/1/0	       -1    0 5gkc10wnagprk/5gkc10wnagprk  A	    0	     105
  1249 TEST	  JDBC Thin Client	 SQL*Net more data from client	1413697536/5/0	       -1    0 dy4rgmf46x1q4/dy4rgmf46x1q4  A	    0	     105

[oracle@userdb-pre admin]$ ora active
 17:23:15 up 1 day,  1:15,  5 users,  load average: 3.07, 1.02, 0.36

   SID USERNAME   MACHINE		 EVENT				PARAM			W   WT SQL			    ST	   LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
   961 TEST	  JDBC Thin Client	 SQL*Net message from client	1413697536/1/0		3    0 5gkc10wnagprk/5gkc10wnagprk  A	    0	     105
     1 TEST	  JDBC Thin Client	 SQL*Net message from client	1413697536/1/0		2    0 dy4rgmf46x1q4/dy4rgmf46x1q4  A	    0	     104

[oracle@userdb-pre admin]$ ora active
 17:23:49 up 1 day,  1:15,  4 users,  load average: 3.68, 1.35, 0.49

   SID USERNAME   MACHINE		 EVENT				PARAM			W   WT SQL			    ST	   LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
  1025 TEST	  JDBC Thin Client	 SQL*Net message from client	1413697536/1/0		1    0 d1yrz7run4p1s/d1yrz7run4p1s  A	    0	     139

Step 2. Halt database machine (将63 database直接halt关闭,模拟机器crash)

BP9OJX}OU[ZIT2MQ[98[5LA

观察10.0.0.24 (standby 将被dgha自动切换至primary,redo,controlfile将被copy至规定目录)

alert_log from 10.0.0.24

Thu Aug 15 17:37:50 2013
ALTER DATABASE   MOUNT
Changing di2dbun from dgha to dgha1
Successful mount of redo thread 1, with mount id 3944140846
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT
Thu Aug 15 17:37:55 2013
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 48 slaves
Thu Aug 15 17:37:56 2013
Recovery of Online Redo Log: Thread 1 Group 3 Seq 60 Reading mem 0
  Mem# 0: /data1/dgha/redo03.log
Media Recovery Complete (dgha)
Completed: ALTER DATABASE RECOVER  database
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 779943 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 58, block 35644
Recovery of Online Redo Log: Thread 1 Group 1 Seq 58 Reading mem 0
  Mem# 0: /data1/dgha/redo01.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 59 Reading mem 0
  Mem# 0: /data1/dgha/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 60 Reading mem 0
  Mem# 0: /data1/dgha/redo03.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 60, block 31, scn 2500652
 0 data blocks read, 0 data blocks written, 779943 redo k-bytes read
LGWR: STARTING ARCH PROCESSES
...

Completed: ALTER DATABASE   MOUNT
Thu Aug 15 17:37:55 2013
ALTER DATABASE RECOVER  database
Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 48 slaves
Thu Aug 15 17:37:56 2013
Recovery of Online Redo Log: Thread 1 Group 3 Seq 60 Reading mem 0
  Mem# 0: /data1/dgha/redo03.log
Media Recovery Complete (dgha)
Completed: ALTER DATABASE RECOVER  database
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 32 processes
Started redo scan
Completed redo scan
 read 779943 KB redo, 0 data blocks need recovery
Started redo application at
 Thread 1: logseq 58, block 35644
Recovery of Online Redo Log: Thread 1 Group 1 Seq 58 Reading mem 0
  Mem# 0: /data1/dgha/redo01.log
Recovery of Online Redo Log: Thread 1 Group 2 Seq 59 Reading mem 0
  Mem# 0: /data1/dgha/redo02.log
Recovery of Online Redo Log: Thread 1 Group 3 Seq 60 Reading mem 0
  Mem# 0: /data1/dgha/redo03.log
Completed redo application of 0.00MB
Completed crash recovery at
 Thread 1: logseq 60, block 31, scn 2500652
 0 data blocks read, 0 data blocks written, 779943 redo k-bytes read
LGWR: STARTING ARCH PROCESSES
...

Archived Log entry 191 added for thread 1 sequence 61 ID 0xeb14a3a4 dest 1:
ARC3: Standby redo logfile selected for thread 1 sequence 60 for destination LOG_ARCHIVE_DEST_3
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_3
******************************************************************
LNS: Standby redo logfile selected for thread 1 sequence 62 for destination LOG_ARCHIVE_DEST_3
..


Completed: alter database open

数据库被recover到了最新的时间点(datafile scn领先于63 database,63此时可以看成24过去某个点的状态)


Step 3. third standby database (10.0.0.13)

alert_log from 10.0.0.13

Error 12543 received logging on to the standby
FAL[client, USER]: Error 12543 connecting to dgha for fetching gap sequence
Thu Aug 15 17:38:14 2013
Media Recovery Log /u01/oracle/11.2.0/oracle/product/11203/db1/dbs/arch1_61_823427815.dbf
Media Recovery Waiting for thread 1 sequence 62 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 62 Reading mem 0
  Mem# 0: /u01/oracle/oradata/dgha/standby04.log
Thu Aug 15 17:39:56 2013
Archived Log entry 62 added for thread 1 sequence 62 ID 0xeb14a3a4 dest 1:
Thu Aug 15 17:39:56 2013
RFS[14]: Selected log 4 for thread 1 sequence 63 dbid -350937948 branch 823427815
Thu Aug 15 17:39:56 2013
Media Recovery Waiting for thread 1 sequence 63 (in transit)
Recovery of Online Redo Log: Thread 1 Group 4 Seq 63 Reading mem 0
  Mem# 0: /u01/oracle/oradata/dgha/standby04.log
Thu Aug 15 17:40:08 2013
Archived Log entry 63 added for thread 1 sequence 63 ID 0xeb14a3a4 dest 1:
Thu Aug 15 17:40:08 2013
RFS[14]: Selected log 4 for thread 1 sequence 64 dbid -350937948 branch 823427815

可以看到 primary 自动切换为了10.0.0.24,13继续recover database.

Step 4. restart 63 database and join the dg cluster (24 is new primary database and 63 is standby database)

(1).copy standby controlfile from 24 database
(2).replace 63’s controlfile using new standby controlfile
(3).recreate standby logfiles
(4).alter database recover managed standby database;


alert_log:

..

Media Recovery Log /u01/oracle/arch/1_66_823427815.dbf
Media Recovery Waiting for thread 1 sequence 67 (in transit)

..
Physical standby database opened for read only access.
Completed: alter database open read only

Step 5. check data between 24 and 63 databases

from 24 database

SQL> select count(*) from test.CUSTOMERS              ;
select count(*) from test.WAREHOUSES             ;
select count(*) from test.ORDER_ITEMS            ;
select count(*) from test.ORDERS                 ;
select count(*) from test.INVENTORIES            ;
select count(*) from test.PRODUCT_INFORMATION    ;
select count(*) from test.LOGON                  ;
select count(*) from test.PRODUCT_DESCRIPTIONS   ;
select count(*) from test.ORDERENTRY_METADATA    ;

  COUNT(*)
----------
  39999996

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
 134986840

SQL> 
  COUNT(*)
----------
  45000000

SQL> 
  COUNT(*)
----------
    901565

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
   9999996

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
	 0

from 63 database

SQL> select count(*) from test.CUSTOMERS              ;
select count(*) from test.WAREHOUSES             ;
select count(*) from test.ORDER_ITEMS            ;
select count(*) from test.ORDERS                 ;
select count(*) from test.INVENTORIES            ;
select count(*) from test.PRODUCT_INFORMATION    ;
select count(*) from test.LOGON                  ;
select count(*) from test.PRODUCT_DESCRIPTIONS   ;
select count(*) from test.ORDERENTRY_METADATA    ;

  COUNT(*)
----------
  39999996

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
 134986840

SQL> 
  COUNT(*)
----------
  45000000

SQL> 
  COUNT(*)
----------
    901565

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
   9999996

SQL> 
  COUNT(*)
----------
      1000

SQL> 
  COUNT(*)
----------
	 0

perfect ! no rows lost.

—————————————————————————

下面我们来剖析一下dgha的行为

Thu Aug 15 17:25:57 2013 - [info] oraha start,welcome to use this tool...
Thu Aug 15 17:25:57 2013 - [info] try to check (10.0.0.63:1521:dgha) and (10.0.0.24:1521:dgha) are really cluster or not.
Thu Aug 15 17:25:57 2013 - [info] try ssh to standby(10.0.0.24,1521,dgha)...
Thu Aug 15 17:25:57 2013 - [info] ssh to 10.0.0.24 success.
Thu Aug 15 17:25:58 2013 - [debug] standby query result: (DGHA,PHYSICAL STANDBY,dgha1)
Thu Aug 15 17:25:58 2013 - [info] try connect to primary(10.0.0.63,1521,dgha)...
Thu Aug 15 17:25:58 2013 - [debug] primary check query sql: (select count(*) cnt from V$ARCHIVE_DEST A,v$database B where A.status='VALID' AND A.DB_UNIQUE_NAME='dgha1' AND A.TARGET='STANDBY' AND B.NAME='DGHA' AND B.DATABASE_ROLE='PRIMARY')
Thu Aug 15 17:25:58 2013 - [info] Attemp to connect to (10.0.0.63:1521:dgha)...
Thu Aug 15 17:25:59 2013 - [info] connect to (10.0.0.63:1521:dgha) success.
Thu Aug 15 17:25:59 2013 - [debug] primary query result rn_cnt: 1
Thu Aug 15 17:25:59 2013 - [info] (10.0.0.63:1521:dgha) and (10.0.0.24:1521:dgha) are really cluster.
Thu Aug 15 17:25:59 2013 - [info] the primary database is (10.0.0.63:1521:dgha)
Thu Aug 15 17:25:59 2013 - [info] the standby database is (10.0.0.24:1521:dgha)
Thu Aug 15 17:25:59 2013 - [info] ssh to 10.0.0.63 success.
Thu Aug 15 17:25:59 2013 - [info] ssh to 10.0.0.24 success.
Thu Aug 15 17:26:03 2013 - [info] ssh to 192.168.254.63 success.
Thu Aug 15 17:26:03 2013 - [info] ssh check passed.
Thu Aug 15 17:26:03 2013 - [info] (10.0.0.63:1521:dgha) /home/oracle/oraha/db_stop.sh check passed.
Thu Aug 15 17:26:03 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step01.sh check passed.
Thu Aug 15 17:26:03 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step02.pl check passed.
Thu Aug 15 17:26:03 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step03.sh check passed.
Thu Aug 15 17:26:04 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_check.sh check passed.
Thu Aug 15 17:26:04 2013 - [info] scripts under /home/oracle/oraha check passed.
Thu Aug 15 17:26:04 2013 - [info] I will wait until (10.0.0.63:1521:dgha) unreachable.health check interval:8
Thu Aug 15 17:26:04 2013 - [info] ======================================================
Thu Aug 15 17:26:04 2013 - [info] ----------             welcome              ----------
Thu Aug 15 17:26:04 2013 - [info] ======================================================
Thu Aug 15 17:26:04 2013 - [debug] connect to (10.0.0.63:1521:dgha) error_count:0
Thu Aug 15 17:31:50 2013 - [debug] UPDATE HACHECK success.
Thu Aug 15 17:31:58 2013 - [debug] connect to (10.0.0.63:1521:dgha) error_count:0

DGHA 会定期去update一张check表判断instance的存活

Thu Aug 15 17:32:03 2013 - [warning] connect to (10.0.0.63:1521:dgha) was abort! connect again.
Thu Aug 15 17:32:03 2013 - [debug] update check == 1.undef_count is 0
Thu Aug 15 17:32:03 2013 - [info] I will destory dbh.
Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Oracle::db handle (DESCRIPTION=(ADDRESS=(HOST=10.0.0.63)(PROTOCOL=tcp)(PORT=1521))(CONNECT_DATA=(SID=dgha))) at oraha.pl line 509.
DBI db handle 0x1efbf710 cleared whilst still active at oraha.pl line 509.
DBI db handle 0x1efbf710 has uncleared implementors data at oraha.pl line 509.
    dbih_clearcom (dbh 0x1efbf710, com 0x1ee4d4f0, imp DBD::Oracle::db):
       FLAGS 0x182097: COMSET IMPSET Active Warn RaiseError PrintWarn ShowErrorStatement 
       PARENT DBI::dr=HASH(0x1ef1df90)
       KIDS 0 (0 Active)
Thu Aug 15 17:32:08 2013 - [warning] destory dbh  was abort!
Thu Aug 15 17:32:08 2013 - [info] Attemp to connect to (10.0.0.63:1521:dgha)...
Thu Aug 15 17:32:13 2013 - [error][oraha.pl, ln426] connect to  (10.0.0.63:1521:dgha) timeout
Thu Aug 15 17:32:13 2013 - [warning] connect to (10.0.0.63:1521:dgha) error_count:1
Thu Aug 15 17:32:21 2013 - [debug] connect to (10.0.0.63:1521:dgha) error_count:1
Thu Aug 15 17:32:21 2013 - [debug] the dbh is null.
Thu Aug 15 17:32:21 2013 - [info] Attemp to connect to (10.0.0.63:1521:dgha)...
Thu Aug 15 17:32:26 2013 - [error][oraha.pl, ln426] connect to  (10.0.0.63:1521:dgha) timeout
Thu Aug 15 17:32:26 2013 - [warning] connect to (10.0.0.63:1521:dgha) was abort! connect again.
Thu Aug 15 17:32:26 2013 - [debug] update check == 1.undef_count is 1
Thu Aug 15 17:32:26 2013 - [info] I will destory dbh.
Thu Aug 15 17:32:26 2013 - [warning] connect to (10.0.0.63:1521:dgha) error_count:2
Thu Aug 15 17:32:34 2013 - [debug] connect to (10.0.0.63:1521:dgha) error_count:2
Thu Aug 15 17:32:34 2013 - [debug] the dbh is null.
Thu Aug 15 17:32:34 2013 - [info] Attemp to connect to (10.0.0.63:1521:dgha)...
Thu Aug 15 17:32:39 2013 - [error][oraha.pl, ln426] connect to  (10.0.0.63:1521:dgha) timeout
Thu Aug 15 17:32:39 2013 - [warning] connect to (10.0.0.63:1521:dgha) was abort! connect again.
Thu Aug 15 17:32:39 2013 - [debug] update check == 1.undef_count is 1
Thu Aug 15 17:32:39 2013 - [info] I will destory dbh.
Thu Aug 15 17:32:39 2013 - [warning] connect to (10.0.0.63:1521:dgha) error_count:3
Thu Aug 15 17:32:39 2013 - [info] try connect to the edm...
Thu Aug 15 17:32:39 2013 - [info] send message:insert into edm_user.tb_queue(ID,PHONE,MSG,STATUS,SENDLEVEL,svrtype,INSERTTIME) 
                                values(edm_user.SEQ_QUE.NEXTVAL,'13817963180','the primary(10.0.0.63:1521:dgha) is unreachable.','',1,'11',sysdate)
Thu Aug 15 17:32:39 2013 - [info] send message:insert into edm_user.tb_queue(ID,PHONE,MSG,STATUS,SENDLEVEL,svrtype,INSERTTIME) 
                                values(edm_user.SEQ_QUE.NEXTVAL,'18666668061','the primary(10.0.0.63:1521:dgha) is unreachable.','',1,'11',sysdate)
Thu Aug 15 17:32:47 2013 - [warning] ssh to 10.0.0.63 err. error_count:1
ssh: connect to host 10.0.0.63 port 22: No route to host
Thu Aug 15 17:32:50 2013 - [warning] ssh to 10.0.0.63 err. error_count:2
ssh: connect to host 10.0.0.63 port 22: No route to host
Thu Aug 15 17:32:53 2013 - [warning] ssh to 10.0.0.63 err. error_count:3
Thu Aug 15 17:32:54 2013 - [info] ssh to 10.0.0.24 success.
Thu Aug 15 17:32:57 2013 - [info] ssh to 192.168.254.63 success.

DGHA 在update多次之后发现oracle死亡,开始尝试ssh到63同样不通(将会连接至63的远程管理卡poweroff掉该机器)同理将会查看待选standby状态,如果可以切换将会启动切换程序,同时发送短信告知63已经死亡。

QQ图片20130815221021

Thu Aug 15 17:32:57 2013 - [info] try to check (10.0.0.24:1521:dgha) is ready to failover to primary or not.
Thu Aug 15 17:32:57 2013 - [info] the standby open mode is:MOUNTED. database role is:PHYSICAL STANDBY.
Thu Aug 15 17:32:57 2013 - [info] standby lag minutes is: 0
Thu Aug 15 17:32:57 2013 - [info] standby (10.0.0.24:1521:dgha) healthcheck is passed. Go on.
Thu Aug 15 17:32:57 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step01.sh check passed.
Thu Aug 15 17:32:58 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step02.pl check passed.
Thu Aug 15 17:32:58 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_start_step03.sh check passed.
Thu Aug 15 17:32:58 2013 - [info] (10.0.0.24:1521:dgha) /home/oracle/oraha/db_check.sh check passed.
Thu Aug 15 17:32:58 2013 - [info] standby (10.0.0.24:1521:dgha) scripts under /home/oracle/oraha check passed.
Thu Aug 15 17:32:58 2013 - [info] Power of 10.0.0.63  was successfully turned off.
Thu Aug 15 17:32:58 2013 - [info] I will execute /home/oracle/oraha/db_start_step01.sh on 10.0.0.24
Thu Aug 15 17:35:58 2013 - [debug] db_start output: 
SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 15 17:32:58 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> 
Thu Aug 15 17:35:58 2013 - [error][oraha.pl, ln180] Oracle on 10.0.0.24 shutdown failed.Kill the pmon next.
Thu Aug 15 17:35:58 2013 - [debug] Oracle pmon pid is: 10179
Thu Aug 15 17:35:58 2013 - [debug] kill -9  output: 
Thu Aug 15 17:36:08 2013 - [info] I will execute /home/oracle/oraha/db_start_step02.pl on 10.0.0.24
Thu Aug 15 17:37:47 2013 - [debug] db_start output: 
Thu Aug 15 17:37:47 2013 - [info] I will execute /home/oracle/oraha/db_start_step03.sh on 10.0.0.24
Thu Aug 15 17:38:06 2013 - [debug] db_start output: 
SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 15 17:37:47 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> ORACLE instance started.

Total System Global Area 4.0486E+10 bytes
Fixed Size                  2237088 bytes
Variable Size            4026535264 bytes
Database Buffers         3.6373E+10 bytes
Redo Buffers               84631552 bytes
Database mounted.
SQL> Media recovery complete.
SQL> 
Database altered.

SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Thu Aug 15 17:38:08 2013 - [info] try to check the new primary(10.0.0.24,1521,dgha) after failover .
Thu Aug 15 17:38:08 2013 - [info] try connect to the new primary(10.0.0.24,1521,dgha)...
Thu Aug 15 17:38:08 2013 - [info] Attemp to connect to (10.0.0.24:1521:dgha)...
Thu Aug 15 17:38:08 2013 - [info] connect to (10.0.0.24:1521:dgha) success.
Thu Aug 15 17:38:08 2013 - [info] the new primary open mode is:READ WRITE. database role is:PRIMARY.
Thu Aug 15 17:38:08 2013 - [info] (10.0.0.63:1521:dgha) failover to (10.0.0.24:1521:dgha) successful.
Thu Aug 15 17:38:08 2013 - [info] try connect to the edm...
Thu Aug 15 17:38:08 2013 - [info] send message:insert into edm_user.tb_queue(ID,PHONE,MSG,STATUS,SENDLEVEL,svrtype,INSERTTIME) 
                                values(edm_user.SEQ_QUE.NEXTVAL,'13817963180','(10.0.0.63:1521:dgha) failover to (10.0.0.24:1521:dgha) successful.','',1,'11',sysdate)
Thu Aug 15 17:38:08 2013 - [info] send message:insert into edm_user.tb_queue(ID,PHONE,MSG,STATUS,SENDLEVEL,svrtype,INSERTTIME) 
                                values(edm_user.SEQ_QUE.NEXTVAL,'18666668061','(10.0.0.63:1521:dgha) failover to (10.0.0.24:1521:dgha) successful.','',1,'11',sysdate)
Killed by signal 1.
Killed by signal 1.

使用存储redo recover完毕后同样会发出短信告知并退出。

QQ图片20130815221008

以下附各种场景的切换测试:

1. 模拟instance hang (oradebug frozen instance) –>当前连接全部hang死,新建process去update check表,两种情况:如果update成功,不切换,如果hang导致update不成功,切换。但是之前的process会处于”僵尸”态。

2. 模拟listener hang (kill -STOP pid of listener and kill active session) 新建process无法update timeout超时–>ssh主机执行abort—>连接至远程卡powerff掉主机—>切换至备选standby.

3. 模拟机器down (直接halt机器,update,ssh均失败,判断死亡,直接切换)

4. 网卡down(处理方式类1)
1,4的处理方式会出现之前dgha建立的连接无法回收,采用在本地部署脚本针对dgha的机器名,idle时间做定期清理。

5. 新的primary切换完成后,就的primary如果需要重新加入集群(copy standby controlfile from new primary,standby logfile需要重建)

关于12c Pluggable database注册listener问题

August 9, 2013 12c, network, oracle No comments

演示一下12c中关于pdb的listener register,unregister问题

[oracle@db-42 ~]$ lsnrctl status listener1

LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 09-AUG-2013 14:53:56

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=db-42)(PORT=1523)))
STATUS of the LISTENER
------------------------
Alias                     listener1
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                08-AUG-2013 21:04:57
Uptime                    0 days 17 hr. 48 min. 58 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /data/app2/oracle/product/11.2.0/db_1/network/admin/listener.ora
Listener Log File         /data/app2/oracle/diag/tnslsnr/db-42/listener1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1523)))
Services Summary...
Service "testcdb" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
Service "testcdbXDB" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
The command completed successfully

——注册pdb到listener1

[oracle@db-42 ~]$ ora si 

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 9 14:54:20 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> desc v$pdbs;
 Name					   Null?    Type
 ----------------------------------------- -------- ----------------------------
 CON_ID 					    NUMBER
 DBID						    NUMBER
 CON_UID					    NUMBER
 GUID						    RAW(16)
 NAME						    VARCHAR2(30)
 OPEN_MODE					    VARCHAR2(10)
 RESTRICTED					    VARCHAR2(3)
 OPEN_TIME					    TIMESTAMP(3)
 CREATE_SCN					    NUMBER
 TOTAL_SIZE					    NUMBER

SQL> select con_id,name from v$pdbs;

    CON_ID NAME
---------- ------------------------------
	 2 PDB$SEED
	 3 CON_CDB_NEW

SQL> alter session set container=CON_CDB_NEW;

Session altered.

SQL> exec dbms_service.start_service('CON_CDB_NEW');

PL/SQL procedure successfully completed.

SQL> !lsnrctl status listener1

LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 09-AUG-2013 14:55:35

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=db-42)(PORT=1523)))
STATUS of the LISTENER
------------------------
Alias                     listener1
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                08-AUG-2013 21:04:57
Uptime                    0 days 17 hr. 50 min. 37 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /data/app2/oracle/product/11.2.0/db_1/network/admin/listener.ora
Listener Log File         /data/app2/oracle/diag/tnslsnr/db-42/listener1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1523)))
Services Summary...
Service "con_cdb_new" has 1 instance(s). ------------------> new pdb service register in listener
  Instance "testcdb", status READY, has 1 handler(s) for this service...
Service "testcdb" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
Service "testcdbXDB" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
The command completed successfully


——- unregister pdb service

SQL> exec dbms_service.stop_service('CON_CDB_NEW');

PL/SQL procedure successfully completed.

SQL> !lsnrctl status listener1;

LSNRCTL for Linux: Version 12.1.0.1.0 - Production on 09-AUG-2013 14:59:05

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=db-42)(PORT=1523)))
STATUS of the LISTENER
------------------------
Alias                     listener1
Version                   TNSLSNR for Linux: Version 12.1.0.1.0 - Production
Start Date                08-AUG-2013 21:04:57
Uptime                    0 days 17 hr. 54 min. 8 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /data/app2/oracle/product/11.2.0/db_1/network/admin/listener.ora
Listener Log File         /data/app2/oracle/diag/tnslsnr/db-42/listener1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1523)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1523)))
Services Summary...
Service "testcdb" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
Service "testcdbXDB" has 1 instance(s).
  Instance "testcdb", status READY, has 1 handler(s) for this service...
The command completed successfully


SQL> ! sqlplus 'sys/oracle@TESTCDB as sysdba'

SQL*Plus: Release 12.1.0.1.0 Production on Fri Aug 9 15:00:04 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor


同理在12c之前包括12c中可以管理多个service_name

[oracle@db-42 ~]$ source ~/.bash_profile_11 
[oracle@db-42 ~]$ ora si 

SQL*Plus: Release 11.2.0.3.0 Production on Fri Aug 9 15:01:13 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> !lsnrctl status  

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 15:01:19

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                09-AUG-2013 14:29:46
Uptime                    0 days 0 hr. 31 min. 33 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /data/app1/oracle/diag/tnslsnr/db-42/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1521)))
Services Summary...
Service "liuyang" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdb" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdbXDB" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
The command completed successfully

SQL> show parameter service_names;

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
service_names			     string	 testdb, liuyang
SQL> exec dbms_service.stop_service('liuyang');

PL/SQL procedure successfully completed.

SQL> !lsnrctl status

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 15:01:47

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                09-AUG-2013 14:29:46
Uptime                    0 days 0 hr. 32 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /data/app1/oracle/diag/tnslsnr/db-42/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1521)))
Services Summary...
Service "testdb" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdbXDB" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
The command completed successfully

SQL> exec dbms_service.start_service('liuyang');

PL/SQL procedure successfully completed.

SQL> !lsnrctl status

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 15:02:12

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                09-AUG-2013 14:29:46
Uptime                    0 days 0 hr. 32 min. 26 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /data/app1/oracle/diag/tnslsnr/db-42/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1521)))
Services Summary...
Service "liuyang" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdb" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdbXDB" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
The command completed successfully

SQL> exec dbms_service.stop_service('testdb');

PL/SQL procedure successfully completed.

SQL> !lsnrctl status 

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 15:02:41

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                09-AUG-2013 14:29:46
Uptime                    0 days 0 hr. 32 min. 54 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Log File         /data/app1/oracle/diag/tnslsnr/db-42/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=db-42)(PORT=1521)))
Services Summary...
Service "liuyang" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdb" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
Service "testdbXDB" has 1 instance(s).
  Instance "testdb", status READY, has 1 handler(s) for this service...
The command completed successfully

注意无法直接停止testdb service,只能对service ‘liuyang’进行操作。

Summary of DBMS_SERVICE Subprograms

Table 140-5 DBMS_SERVICE Package Subprograms

Subprogram Description

CREATE_SERVICE Procedure

Creates service

DELETE_SERVICE Procedure

Deletes service

DISCONNECT_SESSION Procedure

Disconnects sessions running under this service

MODIFY_SERVICE Procedure

Modifies service

START_SERVICE Procedure

Activates service

STOP_SERVICE Procedure

Stops service