trouble shooting

一次诡异的缓慢”死亡” CASE

September 13, 2013 Internals, oracle No comments

今天下午遇到的一个诡异CASE,现象为系统诡异的死亡,目前判定为sqluldr2工具导致 (同事使用sqluldr2 用具 执行了sqluldr2_linux64_10204.bin -help 命令)
抛出的错误如下 (/var/log/message)

Sep 11 09:00:54 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:07:08 db18 last message repeated 2 times
Sep 11 09:08:47 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:09:54 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:10:07 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 09:10:46 db18 kernel: tnslsnr[19787]: segfault at 0000000000000000 rip 00002b92c7d5bebe rsp 00007fff0a1fea90 error 4
Sep 11 10:49:59 db18 auditd[8397]: Audit daemon rotating log files
Sep 11 10:53:41 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 10:54:23 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 12:50:44 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 12:54:37 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808
Sep 11 13:56:56 db18 nrpe[15359]: Host 10.4.0.238 is not allowed to talk to us!
Sep 11 13:57:08 db18 nrpe[15425]: Host 10.4.0.238 is not allowed to talk to us!
Sep 11 14:17:49 db18 setroubleshoot: SELinux is preventing sqluldr2_linux6 from loading /home/oracle/product/10205/db1/lib/libnnz10.so which requires text relocation. For complete SELinux messages. run sealert -l 012919a7-837b-42af-9509-367d9d743808

QQ图片20130913002200

当时系统的SELinux为enable,阻止了sqluldr2对于oracle库文件的操作,多次尝试之后在09:10:46系统出现了异常,当时仅仅表现为listener的异常关闭,自此系统开始了”慢性”死忙

Last login: Tue Sep 10 14:59:41 2013 from 10.4.0.232
[oracle@db18 ~]$ lsnrctl status

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 11-SEP-2013 09:18:47

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=0.0.0.0)(PORT=1522)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused
[oracle@db18 ~]$ 
[oracle@db18 ~]$ 
[oracle@db18 ~]$ ora active
09:18:54 up 702 days, 14:57,  2 users,  load average: 3.06, 2.08, 0.93

no rows selected

[oracle@db18 ~]$ lsnrctl start

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 11-SEP-2013 09:19:16

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /home/oracle/product/10205/db1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 10.2.0.5.0 - Production
System parameter file is /home/oracle/product/10205/db1/network/admin/listener.ora
Log messages written to /home/oracle/product/10205/db1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1522)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=0.0.0.0)(PORT=1522)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.5.0 - Production
Start Date                11-SEP-2013 09:19:16
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /home/oracle/product/10205/db1/network/admin/listener.ora
Listener Log File         /home/oracle/product/10205/db1/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1522)))
Services Summary...
Service "misc" has 1 instance(s).
  Instance "misc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@db18 ~]$ 
[oracle@db18 ~]$ ora active   
 09:19:22 up 702 days, 14:57,  2 users,  load average: 3.09, 2.18, 1.00

12号下午出现的问题,很多process占用大量的CPU资源,包括background process,多个进程处于running假死阶段,sys占用大量系统资源。

QQ图片20130913000507

TBQTS6(34${(BX$GLS~@PVP

database alert log如下:

Errors in file /data/oracle/admin/misc/udump/misc_ora_28401.trc:
ORA-27300: OS system dependent operation:invalid_process_id failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgpalive1

分析问题,查看ora background process进程状态 发现如下异常

[oracle@db18 arch]$ ps -ef | grep -i ora_
oracle   24019     1  0 May18 ?        01:49:30 ora_pmon_misc
oracle   24023     1  0 May18 ?        00:33:02 ora_psp0_misc
oracle   24027     1  0 May18 ?        00:34:32 ora_mman_misc
oracle   24031     1  0 May18 ?        00:59:23 ora_dbw0_misc
oracle   24033     1  0 May18 ?        01:02:13 ora_dbw1_misc
oracle   24035     1  0 May18 ?        00:58:57 ora_dbw2_misc
oracle   24037     1  0 May18 ?        00:59:24 ora_dbw3_misc
oracle   24039     1  0 May18 ?        00:59:06 ora_dbw4_misc
oracle   24043     1  0 May18 ?        00:59:35 ora_dbw5_misc
oracle   24047     1  0 May18 ?        04:28:36 ora_lgwr_misc
oracle   24049     1  0 May18 ?        09:41:15 ora_ckpt_misc
oracle   24053     1  0 May18 ?        00:36:44 ora_smon_misc
oracle   24057     1  0 May18 ?        00:28:39 ora_reco_misc
oracle   24061     1  1 May18 ?        1-06:42:09 ora_mmon_misc
oracle   24065     1  0 May18 ?        22:12:24 ora_mmnl_misc
oracle   24347     1  0 May18 ?        00:37:10 ora_arc0_misc
oracle   24351     1  0 May18 ?        00:43:10 ora_arc1_misc
oracle   24355     1  0 May18 ?        01:42:59 ora_lns1_misc
oracle   24543     1  0 May18 ?        00:28:31 ora_qmnc_misc
oracle   24780     1  0 May18 ?        00:10:05 ora_q000_misc
oracle   24969     1  0 May18 ?        00:09:25 ora_q001_misc
oracle   26077 25306  0 15:31 pts/8    00:00:00 tail -f misc_ora_21763.trc
oracle   27634 22109  0 15:44 pts/6    00:00:00 grep -i ora_
oracle   27867 24023 98 Sep11 ?        1-06:03:47 ora_psp0_misc   ------------刷新时间为Sep11
oracle   27868 27867 98 Sep11 ?        1-06:03:38 ora_psp0_misc   ------------为衍生进程

PSP进程出现异常状态,对于PSP0进程的解释如下:

PSP stands for Process SPawner and this process has the job of creating and managing other Oracle processes.
The initial set of background processes are spawned by the startup process. After the instance is started, PSP0 is responsible for spawning any
required background processes.

当时系统的状态为无法checkpoint 无法 syn log ,包括alter database set 等命令也出现hang死状态

[oracle@db18 ~]$ ora active
15:57:03 up 703 days, 21:35, 12 users,  load average: 24.11, 26.99, 50.27

   SID USERNAME   MACHINE          EVENT                      PARAM                W   WT SQL                        ST     LT LOGON_TIME
------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ----------
  3601 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006158/0       0    0 /frw83qw2fvsa1             K        0    1088732
  3607 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006175/0       0    0 /frw83qw2fvsa1             K        0        3857
  3611 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006137/0       0    0 /frw83qw2fvsa1             K        0    1088741
  3616 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006169/0       0    0 /frw83qw2fvsa1             K        0        3895
  3634 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006140/0       0    0 /frw83qw2fvsa1             K        0    1088735
  3635 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006173/0       0    0 /frw83qw2fvsa1             K        0        3889
  3646 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006209/0       0    0 /frw83qw2fvsa1             A        0        1023
  3650 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006174/0       0    0 /frw83qw2fvsa1             K        0        3882
  3661 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006152/0       0    0 /frw83qw2fvsa1             K        0        5346
  3664 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006138/0       0    0 /frw83qw2fvsa1             K        0      877657
  3668 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006200/0       0    0 /frw83qw2fvsa1             A        0        1027
  3671 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006198/0       0    0 /frw83qw2fvsa1             A        0        1027
  3676 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006184/0       0    0 /6p07vqnwqrsyw            A        0        1032
  3691 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006210/0       0    0 /frw83qw2fvsa1             A        0        1011
  3696 LOG_DATA2  JDBC Thin Client       log file sync                     10795/2810006141/0       0    0 /frw83qw2fvsa1             K        0        5346
  3698 SYS     sqlplus@db18         reliable message            70088494424/70158516   0    0 /                         A           0         390

表现为ckpt lgwr dbwn进程全部死亡 .. oradebug dump systemstat 速度极其缓慢 oradebug dump errorstack hang —- sys 占用大量资源

awk -f ass109.awk systemdump.txt > systemdump.log —并没有发现任何异常等待以及blocker信息

[oracle@db18 ~]$ tail -f -n 50 systemdump.log 
     Cmd: Select
296:waiting for 'SQL*Net message from client' 
     Cmd: Select
297:waiting for 'SQL*Net message from client' 
     Cmd: Select
298:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
299:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
300:waiting for 'SQL*Net message from client' 
     Cmd: PL/SQL Execute
302:waiting for 'SQL*Net message from client' 
     Cmd: Insert
444:waiting for 'SQL*Net message from client' 
     Cmd: Select
532:waiting for 'SQL*Net message from client' 
     Cmd: Select
536:waiting for 'SQL*Net message from client' 
     Cmd: Select
538:waiting for 'SQL*Net message from client' 
     Cmd: Select

NO BLOCKING PROCESSES FOUND

System State 2
~~~~~~~~~~~~~~~~
1:                                      
2:  waiting for 'pmon timer'            
3:  waiting for 'rdbms ipc message'     
4:  last wait for 'rdbms ipc message'   
5:  waiting for 'rdbms ipc message'     
6:  waiting for 'rdbms ipc message'     
7:  waiting for 'rdbms ipc message'     
8:  waiting for 'rdbms ipc message'     
9:  waiting for 'rdbms ipc message'     
10: waiting for 'rdbms ipc message'     
11: waiting for 'rdbms ipc message'     
12: waiting for 'rdbms ipc message'     
13: waiting for 'rdbms ipc message'     
14: waiting for 'smon timer'            
15: last wait for 'os thread startup'   
16: last wait for 'rdbms ipc message'   
17: waiting for 'SQL*Net message from client' 
     Cmd: Select
18:                                     

NO BLOCKING PROCESSES FOUND


171760 Lines Processed.

其现象为系统的一个”缓慢”的死亡行为 (通过SAR 定位到一个折点时间)

sar -f /var/log/sa/sar11   ---- sep 11

09:00:01 AM       all      0.01      0.00      0.01      0.00      0.00     99.97
09:10:01 AM       all      0.03      0.00      0.02      0.00      0.00     99.95   ----->   (09:10:46 db18 kernel: tnslsnr[19787]: segfault at 0000000000000000 rip 00002b92c7d5bebe rsp 00007fff0a1fea90 error 4)
09:20:01 AM       all      2.18      0.00      2.16      0.00      0.00     95.66
09:30:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.71
09:40:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.72
09:50:01 AM       all      3.15      0.00      3.13      0.00      0.00     93.72

03:00:01 PM       all      3.99      0.00      4.39      0.00      0.00     91.62
03:10:01 PM       all      4.04      0.00      4.32      0.00      0.00     91.64
03:20:01 PM       all      4.47      0.00      4.87      0.00      0.00     90.66
03:30:01 PM       all      4.98      0.00      5.47      0.00      0.00     89.55
03:40:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
03:50:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
04:00:01 PM       all      4.99      0.00      5.45      0.00      0.00     89.56
04:10:01 PM       all      4.94      0.00      5.50      0.00      0.00     89.56
04:20:01 PM       all      4.82      0.00      5.62      0.00      0.00     89.56


10:20:01 PM       all     10.72      0.00     12.22      0.00      0.00     77.06
10:30:01 PM       all     10.73      0.00     12.21      0.00      0.00     77.06
10:40:01 PM       all     10.69      0.00     12.25      0.00      0.00     77.06
10:50:01 PM       all     10.67      0.00     12.27      0.00      0.00     77.06
11:00:01 PM       all     10.52      0.00     12.41      0.00      0.00     77.06
11:10:01 PM       all     10.42      0.00     12.52      0.00      0.00     77.06
11:20:01 PM       all     10.38      0.00     12.56      0.00      0.00     77.06


sar -f /var/log/sa/sar12   ---- sep 12

12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM       all     11.23      0.00     11.72      0.10      0.00     76.95
12:20:02 AM       all     11.29      0.00     11.65      0.10      0.00     76.96
12:30:01 AM       all     11.30      0.00     11.63      0.09      0.00     76.96
12:40:01 AM       all     11.71      0.00     12.36      0.10      0.00     75.83
12:50:01 AM       all     12.14      0.00     12.88      0.10      0.00     74.87
01:00:01 AM       all     12.08      0.00     12.94      0.10      0.00     74.88
01:10:01 AM       all     12.24      0.00     12.79      0.10      0.00     74.88
01:20:01 AM       all     12.25      0.00     12.77      0.10      0.00     74.88

10:00:01 AM       all     12.08      0.00     37.94      0.10      0.00     49.88
10:10:01 AM       all     12.07      0.00     37.95      0.09      0.00     49.88
10:20:01 AM       all     12.09      0.00     37.94      0.11      0.00     49.87
10:30:01 AM       all     12.10      0.00     37.93      0.10      0.00     49.88
10:40:01 AM       all     12.02      0.00     38.00      0.11      0.00     49.87
10:50:01 AM       all     12.06      0.00     37.96      0.10      0.00     49.88
11:00:01 AM       all     12.37      0.00     37.65      0.10      0.00     49.88
11:10:01 AM       all     12.32      0.00     37.70      0.10      0.00     49.88
11:20:01 AM       all     12.50      0.00     37.52      0.10      0.00     49.88
11:30:01 AM       all     12.59      0.00     37.43      0.10      0.00     49.88
11:40:01 AM       all     12.77      0.00     37.25      0.11      0.00     49.87

01:50:01 PM       all     13.41      0.00     40.77      0.10      0.00     45.71
02:00:01 PM       all     13.41      0.00     40.78      0.11      0.00     45.71
02:10:01 PM       all     13.41      0.00     40.78      0.10      0.00     45.71
02:20:01 PM       all     13.17      0.00     63.22      0.05      0.00     23.55
02:30:02 PM       all     13.21      0.00     86.78      0.00      0.00      0.00
02:40:01 PM       all     13.29      0.00     86.71      0.00      0.00      0.00
02:50:01 PM       all     13.42      0.00     86.58      0.00      0.00      0.00
03:00:01 PM       all     13.58      0.00     86.42      0.00      0.00      0.00
03:10:01 PM       all     13.38      0.00     86.62      0.00      0.00      0.00
03:20:01 PM       all     13.25      0.00     86.75      0.00      0.00      0.00
03:30:01 PM       all     13.19      0.00     86.80      0.00      0.00      0.00
03:40:01 PM       all     12.95      0.00     86.25      0.00      0.00      0.79

假想:sqluldr2 与 Selinux的冲突导致了oracle so文件的异常,表现为segfault–listener的异常关闭,进而系统缓慢的”死亡” (建议不要在生产系统直接使用sqluldr2而使用TNS远程连接)

gdb 调试TPCC 一例

March 19, 2013 linux, system No comments

TPCC对mysql 5.5.27做一次性能测试时遭遇 Segmentation fault,gdb调试记录之:

[root@db-81 ~]# uname -a 
Linux db-81 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@db-81 ~]# more /proc/meminfo |grep Total
MemTotal:     49455412 kB
HighTotal:           0 kB
LowTotal:     49455412 kB
SwapTotal:    65537156 kB
VmallocTotal: 34359738367 kB
HugePages_Total:     0
[root@db-81 ~]# more /proc/cpuinfo |grep "processor" |wc -l
16
[root@db-81 ~]# more /proc/cpuinfo |grep  "model name"
model name	: AMD Opteron(tm) Processor 6128
[root@db-13 src]# export C_INCLUDE_PATH=/data/mysql/include:/lib/:lib64:$LD_LIBRARY_PATH:$PATH
[root@db-13 src]# export PATH=/data/mysql/bin:/lib:/lib64:$LD_LIBRARY_PATH:$PATH:$PATH
[root@db-13 src]# export LD_LIBRARY_PATH=/data/mysql/lib:$PATH
[root@db-13 src]# make all
cc -w -O2 -g -I. `mysql_config --include`  -c load.c
cc -w -O2 -g -I. `mysql_config --include`  -c support.c
cc load.o support.o `mysql_config --libs_r` -lrt -o ../tpcc_load
cc -w -O2 -g -I. `mysql_config --include`  -c main.c
cc -w -O2 -g -I. `mysql_config --include`  -c spt_proc.c
cc -w -O2 -g -I. `mysql_config --include`  -c driver.c
cc -w -O2 -g -I. `mysql_config --include`  -c sequence.c
cc -w -O2 -g -I. `mysql_config --include`  -c rthist.c
cc -w -O2 -g -I. `mysql_config --include`  -c neword.c
cc -w -O2 -g -I. `mysql_config --include`  -c payment.c
cc -w -O2 -g -I. `mysql_config --include`  -c ordstat.c
cc -w -O2 -g -I. `mysql_config --include`  -c delivery.c
cc -w -O2 -g -I. `mysql_config --include`  -c slev.c
cc main.o spt_proc.o driver.o support.o sequence.o rthist.o neword.o payment.o ordstat.o delivery.o slev.o `mysql_config --libs_r` -lrt -o ../tpcc_start

[root@db-13 tpcc-mysql]# ./tpcc_start -h 127.0.0.1 -d test -u root -p ‘yihaodian’ -w 30 -c 60 -r 60 -l 1000 -f mysql_tpcc_20130319
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
option h with value ‘127.0.0.1’
option d with value ‘test’
option u with value ‘root’
option p with value ‘yihaodian’
option w with value ’30’
option c with value ’60’
option r with value ’60’
option l with value ‘1000’
option f with value ‘mysql_tpcc_20130319’

[server]: 127.0.0.1
[port]: 3306
[DBname]: test
[user]: root
[pass]: yihaodian
[warehouse]: 30
[connection]: 60
[rampup]: 60 (sec.)
[measure]: 1000 (sec.)
Segmentation fault
[root@db-13 tpcc-mysql]# ./tpcc_start
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************

[server]:
[port]: 3306
[DBname]: tpcc
[user]:
[pass]:
[warehouse]: 1
[connection]: 10
[rampup]: 10 (sec.)
[measure]: 20 (sec.)

RAMP-UP TIME.(10 sec.)
Segmentation fault

[root@db-13 tpcc-mysql]# strace ./tpcc_start
execve("./tpcc_start", ["./tpcc_start"], [/* 29 vars */]) = 0
brk(0)                                  = 0xddce000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29d63a000
uname({sys="Linux", node="db-13", ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29d63b000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/data/mysql/lib/tls/x86_64/libmysqlclient.so.18", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/lib/tls/x86_64", 0x7fff4f83cf60) = -1 ENOENT (No such file or directory)
open("/data/mysql/lib/tls/libmysqlclient.so.18", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/lib/tls", 0x7fff4f83cf60) = -1 ENOENT (No such file or directory)
open("/data/mysql/lib/x86_64/libmysqlclient.so.18", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/lib/x86_64", 0x7fff4f83cf60) = -1 ENOENT (No such file or directory)
open("/data/mysql/lib/libmysqlclient.so.18", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\345\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=6870358, ...}) = 0
mmap(NULL, 4577240, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2ac29d63c000
mprotect(0x2ac29d911000, 1048576, PROT_NONE) = 0
mmap(0x2ac29da11000, 540672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2d5000) = 0x2ac29da11000
mmap(0x2ac29da95000, 18392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2ac29da95000
close(3)                                = 0
open("/data/mysql/lib/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/tls/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/bin/tls/x86_64", 0x7fff4f83cf30) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/tls/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/bin/tls", 0x7fff4f83cf30) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/bin/x86_64", 0x7fff4f83cf30) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/data/mysql/bin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/lib/tls/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib/tls/x86_64", 0x7fff4f83cf30) = -1 ENOENT (No such file or directory)
open("/lib/tls/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib/tls", 0x7fff4f83cf30)        = -1 ENOENT (No such file or directory)
open("/lib/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib/x86_64", 0x7fff4f83cf30)     = -1 ENOENT (No such file or directory)
open("/lib/libpthread.so.0", O_RDONLY)  = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\230X\0004\0\0\0"..., 832) = 832
close(3)                                = 0
stat("/lib", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/lib64/tls/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls/x86_64", 0x7fff4f83cf30) = -1 ENOENT (No such file or directory)
open("/lib64/tls/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls", 0x7fff4f83cf30)      = -1 ENOENT (No such file or directory)
open("/lib64/x86_64/libpthread.so.0", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/x86_64", 0x7fff4f83cf30)   = -1 ENOENT (No such file or directory)
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240W\340\253?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=145824, ...}) = 0
mmap(0x3fabe00000, 2204528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3fabe00000
mprotect(0x3fabe16000, 2093056, PROT_NONE) = 0
mmap(0x3fac015000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x3fac015000
mmap(0x3fac017000, 13168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3fac017000
close(3)                                = 0
open("/data/mysql/lib/libm.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/libm.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libm.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20$Z\0004\0\0\0"..., 832) = 832
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`>`\253?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=615136, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29da9a000
mmap(0x3fab600000, 2629848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3fab600000
mprotect(0x3fab682000, 2093056, PROT_NONE) = 0
mmap(0x3fab881000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x81000) = 0x3fab881000
close(3)                                = 0
open("/data/mysql/lib/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/librt.so.1", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200H\230\0004\0\0\0"..., 832) = 832
close(3)                                = 0
open("/lib64/librt.so.1", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \"\340\257?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=53448, ...}) = 0
mmap(0x3fafe00000, 2132936, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3fafe00000
mprotect(0x3fafe07000, 2097152, PROT_NONE) = 0
mmap(0x3fb0007000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x3fb0007000
close(3)                                = 0
open("/data/mysql/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libdl.so.2", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\372W\0004\0\0\0"..., 832) = 832
close(3)                                = 0
open("/lib64/libdl.so.2", O_RDONLY)     = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\16\240\253?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=23360, ...}) = 0
mmap(0x3faba00000, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3faba00000
mprotect(0x3faba02000, 2097152, PROT_NONE) = 0
mmap(0x3fabc02000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x3fabc02000
close(3)                                = 0
open("/data/mysql/lib/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/data/mysql/bin/libc.so.6", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\317D\0004\0\0\0"..., 832) = 832
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332!\253?\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1717800, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29da9b000
mmap(0x3fab200000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3fab200000
mprotect(0x3fab34e000, 2093056, PROT_NONE) = 0
mmap(0x3fab54d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x3fab54d000
mmap(0x3fab552000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3fab552000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29da9c000
arch_prctl(ARCH_SET_FS, 0x2ac29da9c6d0) = 0
mprotect(0x3fab54d000, 16384, PROT_READ) = 0
mprotect(0x3fabc02000, 4096, PROT_READ) = 0
mprotect(0x3fb0007000, 4096, PROT_READ) = 0
mprotect(0x3fab881000, 4096, PROT_READ) = 0
mprotect(0x3fac015000, 4096, PROT_READ) = 0
mprotect(0x3fab01b000, 4096, PROT_READ) = 0
set_tid_address(0x2ac29da9c760)         = 25524
set_robust_list(0x2ac29da9c770, 0x18)   = 0
futex(0x7fff4f83da8c, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigaction(SIGRTMIN, {0x3fabe05380, [], SA_RESTORER|SA_SIGINFO, 0x3fabe0eb10}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x3fabe052b0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3fabe0eb10}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 6), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ac29da9d000
write(1, "********************************"..., 40***************************************
) = 40
write(1, "*** ###easy### TPC-C Load Genera"..., 40*** ###easy### TPC-C Load Generator ***
) = 40
write(1, "********************************"..., 40***************************************
) = 40
write(1, "<Parameters>\n", 13<Parameters>
)          = 13
write(1, "     [server]: \n", 16     [server]: 
)       = 16
write(1, "     [port]: 3306\n", 18     [port]: 3306
)     = 18
write(1, "     [DBname]: tpcc\n", 20     [DBname]: tpcc
)   = 20
write(1, "       [user]: \n", 16       [user]: 
)       = 16
write(1, "       [pass]: \n", 16       [pass]: 
)       = 16
write(1, "  [warehouse]: 1\n", 17  [warehouse]: 1
)      = 17
write(1, " [connection]: 10\n", 18 [connection]: 10
)     = 18
write(1, "     [rampup]: 10 (sec.)\n", 25     [rampup]: 10 (sec.)
) = 25
write(1, "    [measure]: 20 (sec.)\n", 25    [measure]: 20 (sec.)
) = 25
rt_sigaction(SIGALRM, {0x4021c0, [], SA_RESTORER, 0x3fabe0eb10}, NULL, 8) = 0
open("/dev/urandom", O_RDONLY)          = 3
read(3, "\253\224\300!", 4)             = 4
close(3)                                = 0
brk(0)                                  = 0xddce000
brk(0xddef000)                          = 0xddef000
mmap(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x40dfb000
mprotect(0x40dfb000, 4096, PROT_NONE)   = 0
clone(child_stack=0x417fb250, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x417fb9d0, tls=0x417fb940, child_tidptr=0x417fb9d0) = 25525
futex(0x417fb9d0, FUTEX_WAIT, 25525, NULL) = 0
open("/proc/stat", O_RDONLY)            = 3
read(3, "cpu  4327497980 1619853 67702808"..., 8192) = 3722
close(3)                                = 0
open("/etc/nsswitch.conf", O_RDONLY)    = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1696, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaab000
read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1696
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x2aaaaaaab000, 4096)            = 0
open("/data/mysql/lib/libnss_files.so.2", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\37\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=53880, ...}) = 0
mmap(NULL, 2139432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2aaaaaaac000
mprotect(0x2aaaaaab6000, 2093056, PROT_NONE) = 0
mmap(0x2aaaaacb5000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9000) = 0x2aaaaacb5000
close(3)                                = 0
mprotect(0x2aaaaacb5000, 4096, PROT_READ) = 0
open("/etc/services", O_RDONLY)         = 3
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=362031, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaacb7000
read(3, "# /etc/services:\n# $Id: services"..., 4096) = 4096
read(3, "rotocol\nntp\t\t123/tcp\nntp\t\t123/ud"..., 4096) = 4096
read(3, "767/udp\nrsync\t\t873/tcp\t\t\t\t# rsyn"..., 4096) = 4096
read(3, "stmanager\ncvspserver\t2401/tcp\t\t\t"..., 4096) = 4096
close(3)                                = 0
munmap(0x2aaaaacb7000, 4096)            = 0
rt_sigaction(SIGPIPE, {0x1, [PIPE], SA_RESTORER|SA_RESTART, 0x3fab2302d0}, {SIG_DFL, [], 0}, 8) = 0
clone(child_stack=0x417fb250, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x417fb9d0, tls=0x417fb940, child_tidptr=0x417fb9d0) = 25526
mmap(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41ef1000
mprotect(0x41ef1000, 4096, PROT_NONE)   = 0
clone(child_stack=0x428f1250, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x428f19d0, tls=0x428f1940, child_tidptr=0x428f19d0) = 25527
mmap(NULL, 10489856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x428f2000
mprotect(0x428f2000, 4096, PROT_NONE)   = 0
clone(child_stack=0x432f2250, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x432f29d0, tls=0x432f2940, child_tidptr=0x432f29d0) = 25528
+++ killed by SIGSEGV +++


GDB 调用

[root@db-13 tpcc-mysql]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) file tpcc_start
Reading symbols from /tmp/tpcc-mysql/tpcc_start...done.
(gdb) run
Starting program: /tmp/tpcc-mysql/tpcc_start 
[Thread debugging using libthread_db enabled]
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
<Parameters>
     [server]: 
     [port]: 3306
     [DBname]: tpcc
       [user]: 
       [pass]: 
  [warehouse]: 1
 [connection]: 10
     [rampup]: 10 (sec.)
    [measure]: 20 (sec.)
[New Thread 0x40a00940 (LWP 27592)]
[Thread 0x40a00940 (LWP 27592) exited]
[New Thread 0x40a00940 (LWP 27593)]
[New Thread 0x41401940 (LWP 27594)]
[New Thread 0x41e02940 (LWP 27595)]
[New Thread 0x42803940 (LWP 27596)]
[New Thread 0x43204940 (LWP 27597)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40a00940 (LWP 27593)]
my_stat (path=0x409ff750 "/usr/local/mysql/share/charsets/Index.xml", stat_area=0x409ff620, my_flags=0) at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c:413
413	/pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c: No such file or directory.
	in /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c
(gdb) 
(gdb) q
A debugging session is active.

	Inferior 1 [process 27589] will be killed.

Quit anyway? (y or n) y

删除 mysql.sock

[root@db-13 tmp]# rm mysql.sock 
rm: remove symbolic link `mysql.sock'? y

再次调用

[root@db-13 tpcc-mysql]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) file tpcc_start
Reading symbols from /tmp/tpcc-mysql/tpcc_start...done.
(gdb) run
Starting program: /tmp/tpcc-mysql/tpcc_start 
[Thread debugging using libthread_db enabled]
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
<Parameters>
     [server]: 
     [port]: 3306
     [DBname]: tpcc
       [user]: 
       [pass]: 
  [warehouse]: 1
 [connection]: 10
     [rampup]: 10 (sec.)
    [measure]: 20 (sec.)
[New Thread 0x40a00940 (LWP 28199)]
[Thread 0x40a00940 (LWP 28199) exited]
[New Thread 0x40a00940 (LWP 28200)]
[New Thread 0x41401940 (LWP 28201)]
error at thread_main
0, , 
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[Thread 0x41401940 (LWP 28201) exited]
[New Thread 0x41e02940 (LWP 28202)]
error at thread_main
[Thread 0x40a00940 (LWP 28200) exited]
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[New Thread 0x42803940 (LWP 28203)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[Thread 0x41e02940 (LWP 28202) exited]
[Thread 0x42803940 (LWP 28203) exited]
[New Thread 0x43204940 (LWP 28204)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[New Thread 0x43c05940 (LWP 28205)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[Thread 0x43204940 (LWP 28204) exited]
[Thread 0x43c05940 (LWP 28205) exited]
[New Thread 0x44606940 (LWP 28207)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[New Thread 0x45007940 (LWP 28208)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[Thread 0x44606940 (LWP 28207) exited]
[Thread 0x45007940 (LWP 28208) exited]
[New Thread 0x45a08940 (LWP 28210)]
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[New Thread 0x46409940 (LWP 28211)]
[Thread 0x45a08940 (LWP 28210) exited]

RAMP-UP TIME.(10 sec.)
error at thread_main
2002, HY000, Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
[Thread 0x46409940 (LWP 28211) exited]

MEASURING START.

  10, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000
  20, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000

STOPPING THREADS

<Raw Results>
  [0] sc:0  lt:0  rt:0  fl:0 
  [1] sc:0  lt:0  rt:0  fl:0 
  [2] sc:0  lt:0  rt:0  fl:0 
  [3] sc:0  lt:0  rt:0  fl:0 
  [4] sc:0  lt:0  rt:0  fl:0 
 in 20 sec.

<Raw Results2(sum ver.)>
  [0] sc:0  lt:0  rt:0  fl:0 
  [1] sc:0  lt:0  rt:0  fl:0 
  [2] sc:0  lt:0  rt:0  fl:0 
  [3] sc:0  lt:0  rt:0  fl:0 
  [4] sc:0  lt:0  rt:0  fl:0 

<Constraint Check> (all must be [OK])
 [transaction percentage]
        Payment: nan% (>=43.0%) [NG] *
   Order-Status: nan% (>= 4.0%) [NG] *
       Delivery: nan% (>= 4.0%) [NG] *
    Stock-Level: nan% (>= 4.0%) [NG] *
 [response time (at least 90% passed)]
      New-Order: nan%  [NG] *
        Payment: nan%  [NG] *
   Order-Status: nan%  [NG] *
       Delivery: nan%  [NG] *
    Stock-Level: nan%  [NG] *

<TpmC>
                 0.000 TpmC

Program exited normally.
(gdb) q

重新link mysql.sock

[root@db-13 data]# ln -s  /data/mysql/data/mysql.sock  /tmp/mysql.sock
[root@db-13 data]# ls -l /tmp/mysql.sock 
lrwxrwxrwx 1 root root 27 Mar 19 15:59 /tmp/mysql.sock -> /data/mysql/data/mysql.sock

再次调用

[root@db-13 tpcc-mysql]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) file tpcc_start
Reading symbols from /tmp/tpcc-mysql/tpcc_start...done.
(gdb) run
Starting program: /tmp/tpcc-mysql/tpcc_start 
[Thread debugging using libthread_db enabled]
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
<Parameters>
     [server]: 
     [port]: 3306
     [DBname]: tpcc
       [user]: 
       [pass]: 
  [warehouse]: 1
 [connection]: 10
     [rampup]: 10 (sec.)
    [measure]: 20 (sec.)
[New Thread 0x40a00940 (LWP 28955)]
[Thread 0x40a00940 (LWP 28955) exited]
[New Thread 0x40a00940 (LWP 28956)]
[New Thread 0x41401940 (LWP 28957)]
[New Thread 0x41e02940 (LWP 28959)]
[New Thread 0x42803940 (LWP 28961)]
[New Thread 0x43204940 (LWP 28963)]
[New Thread 0x43c05940 (LWP 28965)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x40a00940 (LWP 28956)]
my_stat (path=0x409ff750 "/usr/local/mysql/share/charsets/Index.xml", stat_area=0x409ff620, my_flags=0) at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c:413
413	/pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c: No such file or directory.
	in /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/mysys/my_lib.c
(gdb) q
A debugging session is active.

	Inferior 1 [process 28952] will be killed.

Quit anyway? (y or n) y

[root@db-13 tpcc-mysql]# find / -name mysql-5.5.27
[root@db-13 tpcc-mysql]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 148786916 4382960 136724084 4% /
/dev/sda7 258699556 191712 245154620 1% /backup
/dev/sdb1 1134323348 178684616 897089040 17% /data
/dev/sda6 10153988 8957012 672860 94% /var
/dev/sda5 20315812 203008 19064164 2% /home
/dev/sda1 248895 12332 223713 6% /boot
tmpfs 66047196 0 66047196 0% /dev/shm

创建path file
[root@db-13 data]# mkdir -p /usr/local/mysql/share/charsets/
[root@db-13 data]# touch /usr/local/mysql/share/charsets/Index.xml

再次调用

[root@db-13 tpcc-mysql]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) file ptcc_start
ptcc_start: No such file or directory.
(gdb) file tpcc_start
Reading symbols from /tmp/tpcc-mysql/tpcc_start...done.
(gdb) run
Starting program: /tmp/tpcc-mysql/tpcc_start 
[Thread debugging using libthread_db enabled]
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
<Parameters>
     [server]: 
     [port]: 3306
     [DBname]: tpcc
       [user]: 
       [pass]: 
  [warehouse]: 1
 [connection]: 10
     [rampup]: 10 (sec.)
    [measure]: 20 (sec.)
[New Thread 0x40a00940 (LWP 29179)]
[Thread 0x40a00940 (LWP 29179) exited]
[New Thread 0x40a00940 (LWP 29180)]
[New Thread 0x41401940 (LWP 29181)]
[New Thread 0x41e02940 (LWP 29182)]
[New Thread 0x42803940 (LWP 29183)]
[New Thread 0x43204940 (LWP 29184)]
[New Thread 0x43c05940 (LWP 29185)]
[New Thread 0x44606940 (LWP 29186)]
error at thread_main
error at thread_main
error at thread_main
1045, 28000, 
error at thread_main
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
error at thread_main
[New Thread 0x45007940 (LWP 29188)]
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
error at thread_main
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
[Thread 0x43204940 (LWP 29184) exited]
error at thread_main
[Thread 0x41e02940 (LWP 29182) exited]
[New Thread 0x45a08940 (LWP 29189)]
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
error at thread_main
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
[Thread 0x42803940 (LWP 29183) exited]
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
[Thread 0x44606940 (LWP 29186) exited]
[Thread 0x45007940 (LWP 29188) exited]
[Thread 0x43c05940 (LWP 29185) exited]
[Thread 0x41401940 (LWP 29181) exited]
[New Thread 0x46409940 (LWP 29190)]
error at thread_main

RAMP-UP TIME.(10 sec.)
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
[Thread 0x40a00940 (LWP 29180) exited]
[Thread 0x45a08940 (LWP 29189) exited]
error at thread_main
1045, 28000, Access denied for user 'root'@'localhost' (using password: NO)
[Thread 0x46409940 (LWP 29190) exited]

MEASURING START.

  10, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000, 0(0):0.000|0.000

Program received signal SIGINT, Interrupt.
0x0000003fabe0e39d in pause () from /lib64/libpthread.so.0
(gdb) 
(gdb) q
A debugging session is active.

	Inferior 1 [process 29176] will be killed.

Quit anyway? (y or n) 
Please answer y or n.
A debugging session is active.

	Inferior 1 [process 29176] will be killed.

Quit anyway? (y or n) y
[root@db-13 tpcc-mysql]# ./tpcc_start  -h 127.0.0.1 -d test  -u root  -p 'yihaodian'  -w 30  -c 60  -r 60   -l 1000 -f mysql_tpcc_20130319
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
option h with value '127.0.0.1'
option d with value 'test'
option u with value 'root'
option p with value 'yihaodian'
option w with value '30'
option c with value '60'
option r with value '60'
option l with value '1000'
option f with value 'mysql_tpcc_20130319'
<Parameters>
     [server]: 127.0.0.1
     [port]: 3306
     [DBname]: test
       [user]: root
       [pass]: yihaodian
  [warehouse]: 30
 [connection]: 60
     [rampup]: 60 (sec.)
    [measure]: 1000 (sec.)
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist
error at thread_main
1146, 42S02, Table 'test.customer' doesn't exist


OK!

[root@db-13 tpcc-mysql]# ./tpcc_start -h 127.0.0.1 -d tpcc1000 -u root -p ‘yihaodian’ -w 30 -c 60 -r 60 -l 1000 -f mysql_tpcc_20130319

***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
option h with value '127.0.0.1'
option d with value 'tpcc1000'
option u with value 'root'
option p with value 'yihaodian'
option w with value '30'
option c with value '60'
option r with value '60'
option l with value '1000'
option f with value 'mysql_tpcc_20130319'
<Parameters>
     [server]: 127.0.0.1
     [port]: 3306
     [DBname]: tpcc1000
       [user]: root
       [pass]: yihaodian
  [warehouse]: 30
 [connection]: 60
     [rampup]: 60 (sec.)
    [measure]: 1000 (sec.)

RAMP-UP TIME.(60 sec.)

MEASURING START.

  10, 2639(0):2.135|3.487, 2635(0):0.443|1.441, 263(0):0.258|0.458, 264(0):2.723|3.562, 263(0):6.258|9.052
  20, 3144(0):2.073|2.507, 3151(0):0.431|0.539, 316(0):0.267|0.368, 318(0):2.680|2.907, 314(0):6.354|8.062
  30, 2609(0):2.024|2.413, 2602(0):0.417|0.505, 259(0):0.245|0.288, 257(0):2.774|2.940, 261(0):6.030|6.613
  40, 3311(0):2.079|2.454, 3309(0):0.426|0.522, 332(0):0.238|0.270, 328(0):2.699|3.071, 331(0):5.638|7.017
  50, 2464(0):2.105|2.475, 2469(0):0.425|0.540, 246(0):0.229|0.289, 247(0):2.651|2.820, 246(0):6.281|6.725
  60, 2884(0):2.085|2.773, 2883(0):0.427|0.501, 289(0):0.253|0.281, 295(0):2.712|2.879, 289(0):6.327|6.646
  70, 2641(0):2.043|2.436, 2640(0):0.437|0.701, 263(0):0.248|0.292, 263(0):2.833|2.868, 263(0):5.315|6.035
  80, 3045(0):2.038|2.497, 3049(0):0.429|0.558, 306(0):0.252|0.452, 310(0):2.692|2.841, 305(0):5.783|7.210
  90, 2551(0):2.043|2.564, 2551(0):0.433|0.552, 255(0):0.258|0.325, 249(0):2.653|2.823, 256(0):5.321|6.107
 100, 2553(0):2.078|2.721, 2551(0):0.439|0.551, 254(0):0.231|0.247, 256(0):2.688|3.792, 255(0):6.583|7.014
 110, 2923(0):2.060|2.483, 2924(0):0.426|0.763, 293(0):0.251|0.295, 295(0):2.645|2.878, 293(0):5.843|6.378
 120, 2639(0):2.007|2.252, 2638(0):0.434|0.539, 264(0):0.234|0.301, 261(0):2.726|2.823, 262(0):6.181|6.839
 130, 2884(0):2.062|2.515, 2885(0):0.425|0.554, 288(0):0.248|0.285, 283(0):2.597|2.892, 289(0):6.086|6.667
 140, 2758(0):2.095|2.542, 2750(0):0.434|0.962, 276(0):0.252|0.267, 279(0):2.662|3.059, 276(0):5.527|7.356
 150, 2406(0):2.011|2.492, 2416(0):0.426|0.492, 241(0):0.259|0.285, 248(0):2.650|2.691, 240(0):5.834|6.283
 160, 2903(0):2.066|2.361, 2904(0):0.431|0.522, 289(0):0.255|0.328, 282(0):2.828|2.927, 291(0):5.757|6.791
 170, 2677(0):2.023|2.730, 2672(0):0.426|0.535, 269(0):0.249|0.348, 267(0):2.674|2.810, 268(0):5.374|7.441
 180, 2395(0):2.061|2.474, 2385(0):0.440|0.555, 239(0):0.249|0.259, 254(0):2.643|2.724, 239(0):5.896|7.731

一次cursor: pin S wait on X的解决

November 2, 2011 oracle, RAC 3 comments

环境描述:双节点rac 数据库版本10.2.0.5 通过awr发现down机时间段产生大量的cursor: pin S wait on X等待,同时伴随大量的library latch等待,通过与相关人员沟通得知,此库在每月的月头都会有大量的报表业务,从而负载将会增长几十倍之多。通过下面
的awr可以看出负载确实很高:

DB Name DB Id Instance Inst num Release RAC Host
PORTALDB 597889515 portaldb1 1 10.2.0.5.0 YES ptdb01

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 4115 02-Nov-11 11:00:13 209 131.6
End Snap: 4116 02-Nov-11 12:03:09 289 16.2
Elapsed:   62.93 (mins)    
DB Time:   15,687.33 (mins)    

通过对比awr与其他时间段的awr发现DBTIME 增加了几十倍 说明确实存在大量的业务

通过TOP5时间 发现最多的等待为cursor: pin S wait on X 同时伴随library cache等待:

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
cursor: pin S wait on X 77,646,738 759,238 10 80.7 Concurrency
library cache lock 200,160 95,987 480 10.2 Concurrency
enq: TX – row lock contention 70,851 34,471 487 3.7 Application
CPU time   18,501   2.0  
library cache pin 5,680 14,825 2,610 1.6 Concurrency

SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final,to_char(start_time,’dd-mon hh24:mi:ss’) Started FROM V$SGA_RESIZE_OPS查询
短时间内发现大量的shrink,grow,由此怀疑业务高峰期间大量的shrink,grow导致shard_pool的部分cursor被刷出,从而导致hard parse的增加进而导致library cache 等待的发生,频繁的shrink,grow也会导致cursor: pin S wait on X的发生。

通过沟通,夜里将关闭ASMM,并且通过当时awr报告计算出shared_pool,buffer_cache峰值,明天将继续观察负载情况。
另外在峰值时段发现大量append操作引发enq-TX等待也将与应用人员沟通

跟进今天的情况:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 4139 03-Nov-11 11:00:18 118 81.5
End Snap: 4140 03-Nov-11 12:00:36 159 108.3
Elapsed:   60.31 (mins)    
DB Time:   428.83 (mins)    

从上述情况看到今天的负载小于昨天,下面是今天的等待事件:

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time   20,034   77.9  
db file sequential read 333,128 1,400 4 5.4 User I/O
direct path read temp 175,492 434 2 1.7 User I/O
db file scattered read 239,062 261 1 1.0 User I/O
gc current block 2-way 331,161 231 1 .9 Cluster

可以看到昨天的cursor等待消失

Wait Events

  • s – second
  • cs – centisecond – 100th of a second
  • ms – millisecond – 1000th of a second
  • us – microsecond – 1000000th of a second
  • ordered by wait time desc, waits desc (idle events last)
  • %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0

Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn
db file sequential read 333,128   1,400 4 14.37
direct path read temp 175,492   434 2 7.57
db file scattered read 239,062   261 1 10.31
gc current block 2-way 331,161   231 1 14.28
gc cr multi block request 302,696   129 0 13.06
log file sequential read 6,144   106 17 0.27
log file parallel write 148,777   101 1 6.42
direct path write temp 186,873   88 0 8.06
gcs log flush sync 161,961 0 77 0 6.99
gc cr block 2-way 105,826   71 1 4.56
gc cr grant 2-way 127,157   63 0 5.48
log file sync 18,846 1 49 3 0.81
db file parallel read 13,520   29 2 0.58
CSS operation: query 1,230   22 18 0.05
control file sequential read 36,492   20 1 1.57
CSS initialization 410   18 44 0.02
gc current grant busy 26,892 0 17 1 1.16
gc current grant 2-way 42,689   13 0 1.84
gc buffer busy 6,259 0 9 1 0.27
control file parallel write 1,869   5 3 0.08
read by other session 846   5 6 0.04
cursor: pin S wait on X 434 100 4 10 0.02
latch: cache buffers chains 6,638   3 1 0.29
enq: TX – row lock contention 12 33 3 252 0.00
gc cr block busy 995   3 3 0.04
SQL*Net message to client 3,549,215   3 0 153.09
enq: WF – contention 14 29 2 145 0.00
DFS lock handle 2,376   1 1 0.10
direct path read 3,331   1 0 0.14
gc current block busy 199   1 6 0.01
row cache lock 4,347   1 0 0.19
cr request retry 1 100 1 977 0.00
enq: KO – fast object checkpoint 38 3 1 25 0.00
gc current multi block request 7,843   1 0 0.34
log buffer space 4   1 221 0.00
latch: KCL gc element parent latch 74   1 10 0.00
SQL*Net more data from client 7,756   1 0 0.33
gc current block congested 37   1 15 0.00
latch free 22   1 23 0.00
Log archive I/O 6,119   0 0 0.26
CGS wait for IPC msg 47,974 99 0 0 2.07
SQL*Net break/reset to client 2,210   0 0 0.10
enq: FB – contention 1,560   0 0 0.07
log file switch completion 20   0 18 0.00
CSS operation: action 807   0 0 0.03
gc cr block congested 11   0 25 0.00
gc cr grant congested 16   0 17 0.00
os thread startup 10   0 23 0.00
SQL*Net more data to client 15,643   0 0 0.67
kksfbc child completion 3 100 0 49 0.00
reliable message 306   0 0 0.01
direct path write 618   0 0 0.03
buffer busy waits 3,383   0 0 0.15
enq: PS – contention 203   0 0 0.01
LGWR wait for redo copy 8,131   0 0 0.35
ksxr poll remote instances 10,924 56 0 0 0.47
enq: CF – contention 157   0 1 0.01
library cache pin 301   0 0 0.01
PX Deq: reap credit 6,959 98 0 0 0.30
enq: TX – index contention 48   0 2 0.00
rdbms ipc reply 351   0 0 0.02
SQL*Net message from dblink 66   0 1 0.00
library cache lock 144   0 0 0.01
PX Deq Credit: send blkd 196 13 0 0 0.01
enq: HW – contention 104   0 0 0.00
enq: TM – contention 123   0 0 0.01
latch: cache buffers lru chain 112   0 0 0.00
gc current block unknown 49   0 0 0.00
IPC send completion sync 63 100 0 0 0.00
PX Deq: Signal ACK 203 39 0 0 0.01
latch: object queue header operation 29   0 1 0.00
latch: shared pool 42   0 0 0.00
gc current split 2   0 4 0.00
gc current grant congested 1   0 8 0.00
gc current retry 3   0 2 0.00
KJC: Wait for msg sends to complete 7   0 0 0.00
enq: TA – contention 11   0 0 0.00
latch: row cache objects 43   0 0 0.00
log file single write 12   0 0 0.00
ges inquiry response 6   0 0 0.00
latch: session allocation 3   0 1 0.00
enq: TD – KTF dump entries 6   0 0 0.00
latch: checkpoint queue latch 8   0 0 0.00
latch: library cache lock 7   0 0 0.00
enq: JS – job run lock – synchronize 2   0 1 0.00
enq: WL – contention 3   0 0 0.00
enq: TX – allocate ITL entry 3   0 0 0.00
enq: TT – contention 2   0 0 0.00
enq: MW – contention 2   0 0 0.00
enq: DR – contention 2   0 0 0.00
cursor: pin S 309   0 0 0.01
latch: library cache 3   0 0 0.00
SQL*Net more data to dblink 6   0 0 0.00
latch: gcs resource hash 3   0 0 0.00
latch: redo allocation 3   0 0 0.00
SQL*Net message to dblink 63   0 0 0.00
latch: ges resource hash list 1   0 0 0.00
lock escalate retry 7 86 0 0 0.00
SQL*Net more data from dblink 3   0 0 0.00
latch: messages 1   0 0 0.00
lock deadlock retry 1 100 0 0 0.00
cursor: mutex S 1   0 0 0.00
SQL*Net message from client 3,550,480   340,614 96 153.14
gcs remote message 2,022,558 19 17,483 9 87.24
virtual circuit status 121 100 3,540 29258 0.01
ges remote message 64,294 67 3,531 55 2.77
Streams AQ: qmn slave idle wait 128   3,531 27587 0.01
Streams AQ: qmn coordinator idle wait 265 48 3,531 13325 0.01
PX Idle Wait 1,476 93 3,530 2391 0.06
ASM background timer 806   3,528 4377 0.03
class slave wait 633 94 3,527 5573 0.03
Streams AQ: waiting for time management or cleanup tasks 12 100 3,464 288641 0.00
DIAG idle wait 15,062 4 2,928 194 0.65
i/o slave wait 580,651   1,423 2 25.05
PX Deq: Execute Reply 2,072 48 218 105 0.09
jobq slave wait 40 100 117 2930 0.00
PX Deq: Parse Reply 229 34 1 4 0.01
PX Deq: Execution Msg 431 8 1 2 0.02
PX Deq: Join ACK 242 38 0 1 0.01
KSV master wait 32   0 7 0.00
single-task message 3   0 32 0.00
PX Deq: Msg Fragment 95 9 0 0 0.00
PX Deq Credit: need buffer 44 11 0 0 0.00
Streams AQ: RAC qmn coordinator idle wait 265 100 0 0 0.01
PX Deq: Table Q Normal 8   0 0 0.00