system

How to configure Tcpdump system

July 24, 2015 Architect, Architecture, Internals, MYSQL, network, system No comments

MySQL Tcpdump system : use percona-toolkit to analyze network packages

We can identify problem SQLs with high execution frequency.

With DBMON system and AWR system we can find problem SQLs in a special time (high frequency, occurs over a period of time)

View this PDF:

http://www.vmcd.org/docs/MySQL_TCPDUMP.pdf

linux tools part 10– Monitoring process iostat

April 24, 2015 system No comments

We can use iopp tool to monitor process io performance

download from github : https://github.com/markwkm/iopp

pid: The process id.
rchar: The number of bytes which this task has caused to be read from storage.
wchar: The number of bytes which this task has caused, or shall cause to be written to disk.
syscr: Count of the number of read I/O operations.
syscw: Count of the number of write I/O operations.
rbytes rkb rmb reads: Count of the number of bytes which this process really did cause to be fetched from the storage layer.
wbytes wkb wmb writes: Count of the number of bytes which this process really did cause to be sent to the storage layer.
cwbytes cwkb cwmb cwrites: The number of bytes which this process caused to not happen, by truncating pagecache.
command: Filename of the executable.

There is one small bug , you’ll get different output from -i -k than you will from -i or from -i -m.

#iopp -i -k 2
QQ图片20150424145546
#grep ^ /sys/class/block/*/dev |grep 224

/sys/class/block/sdo/dev:8:224

#lsof |grep “8,224” |more

bash 5265 root cwd DIR 8,224 57151488 427819009 /data/yp900
 redis-ser 19843 root cwd DIR 8,224 4096 469368833 /data/redis6379
 sh 27579 root cwd DIR 8,224 4096 579731457 /data/mysql3306
 mysqld 28324 mysql cwd DIR 8,224 4096 579731459 /data/mysql3306/data
 mysqld 28324 mysql 1w REG 8,224 1327901 579743077 /data/mysql3306/mysql-error.log
 mysqld 28324 mysql 2w REG 8,224 1327901 579743077 /data/mysql3306/mysql-error.log
 mysqld 28324 mysql 3u REG 8,224 23520 579743078 /data/mysql3306/binlog/mysql-bin.index
 mysqld 28324 mysql 4uW REG 8,224 1073741824 579743083 /data/mysql3306/data/ibdata0
 mysqld 28324 mysql 5u REG 8,224 0 579731461 /data/mysql3306/mysql-tmpdir/ib1io6G9 (deleted)
 mysqld 28324 mysql 6u REG 8,224 0 579731462 /data/mysql3306/mysql-tmpdir/ibnkifvg (deleted)
 mysqld 28324 mysql 7u REG 8,224 0 579731463 /data/mysql3306/mysql-tmpdir/ibHWjojn (deleted)
 mysqld 28324 mysql 8u REG 8,224 0 579731464 /data/mysql3306/mysql-tmpdir/ib1o8yHC (deleted)
 mysqld 28324 mysql 9uW REG 8,224 1073741824 579743084 /data/mysql3306/data/ibdata1
 mysqld 28324 mysql 10uW REG 8,224 1073741824 579743085 /data/mysql3306/data/ibdata2
 mysqld 28324 mysql 11uW REG 8,224 27850178560 579743086 /data/mysql3306/data/ibdata3
 mysqld 28324 mysql 12uW REG 8,224 536870912 579743087 /data/mysql3306/data/ib_logfile0
 mysqld 28324 mysql 13uW REG 8,224 536870912 579743088 /data/mysql3306/data/ib_logfile1
 mysqld 28324 mysql 14uW REG 8,224 2281701376 579733990 /data/mysql3306/data/mit/agent_info#P#p20150404.ibd
 mysqld 28324 mysql 15u REG 8,224 0 579731465 /data/mysql3306/mysql-tmpdir/ibBZSSPL (deleted)

A nice smart tools with a little bugs 🙂

Design for “seckilling ” on MySQL

March 13, 2015 system No comments

Today I read a nice article , which give a new design for mysql database . In this article , Author (Mr.lou) uses OneSQL (a branch of MySQL) for the whole test.

you can read from : http://www.onexsoft.com/?page_id=378

This article explain how to use mysql in seckilling scenario (especially for E-Commerce commany )

If you want to use transaction in RDBMS database, you must manual start transaction and commit in the end of your operation.
So,commit will be cost of the total process, round-trip cost will be the most important cost as author said

(For simple update operations, without regard to conflict of IO and locks, a statement execution time is about 0.1ms, network delay under normal conditions is 0.4-0.8ms, that the waiting time is longer than the transaction commit time to really notice the SQL execution several times, showing that the network is a single-line transaction update performance killer)

How to avoid this ? We can start a transaction to manually control SQL commit. When autocommit=1 was set on MySQL server implicitly commit will happen after every SQLs.

An explicit transaction is when statement(s) are wrapped within an explicitly defined transaction code block – for MySQL, that’s START TRANSACTION. It also requires an explicitly made COMMIT or ROLLBACK statement at the end of the transaction. Nested transactions is beyond the scope of this topic.

Implicit transactions are slightly different from explicit ones. Implicit transactions do not require explicity defining a transaction. However, like explicit transactions they require a COMMIT or ROLLBACK statement to be supplied.

Using OneSQL ,Commit operation can be done on MySQL server side to avoid network round-trip. In total logical process, everything will be controlled by yourself .

In seckilling scenario, this optimization will give a tremendous performance promotion.

Actually ,we can also use Redis or memcached for this situation, Redis CAS feature witch MULTI in serial process will be a good choice.

Best practice – try to offload request before database. Order creation should be asynchronous (put operations in a serial async queue is the best choice)

小型机迁移x86平台之老生常谈

July 17, 2014 migration, oracle, system, unix No comments

You can view this presentation click here: x86_architect
注意使用chrome打开文件 不然可能乱码

linux tools part 9– Monitoring system performance

April 25, 2014 linux, system No comments

淘宝的开源监控工具 –同事推荐安装测下来非常不错 可以配合zabbix进行数据的采集

link:http://tsar.taobao.org/

[mysql@mysqldb tmp]$ tsar 
 
Time           ---cpu-- ---mem-- ---tcp-- -----traffic---- --sda---  ---load- 
Time             util     util   retran    bytin  bytout     util     load1   
24/04/14-22:30   1.19    64.77     0.00    13.00    5.00     0.13     0.00   
24/04/14-22:35   1.29    64.70     0.00    47.00    5.00     0.08     0.00   
24/04/14-22:40   1.35    64.73     0.00    17.00    6.00     0.09     0.00   
24/04/14-22:45   1.32    64.75     0.00    17.00    5.00     0.17     0.00   
24/04/14-22:50   1.29    64.76     0.00    22.00    5.00     0.05     0.00   
24/04/14-22:55   1.33    64.69     0.00    17.00    6.00     0.10     0.00   
24/04/14-23:00   1.30    64.77     0.00    13.00    5.00     0.12     0.00   
24/04/14-23:05   1.99    62.80     0.00    98.00   27.00     4.44     0.30   
24/04/14-23:10   1.32    62.84     0.00    17.00    5.00     0.07     0.00   
24/04/14-23:15   1.28    62.79     0.00    22.00    5.00     0.09     0.00   
24/04/14-23:20   1.64    62.83     0.00    16.00    6.00     0.16     0.00   
24/04/14-23:25   1.44    62.75     0.00    14.00    5.00     0.03     0.00   
24/04/14-23:30   1.40    62.82     0.00    16.00    6.00     0.10     0.00   
24/04/14-23:35   1.95    62.76     0.00    68.00   60.00     1.52     0.08   
24/04/14-23:40 ------    16.26     0.00   ------  ------     0.00     0.56   
24/04/14-23:45   1.50    15.96     0.00    83.00  120.00     0.18     0.00   
24/04/14-23:50   1.37    16.17     0.00   146.00  428.00     0.36     0.00   

MAX              2.87    87.37    35.71    14.2K    1.3K    12.42     0.56   
MEAN             1.57    40.76     0.37   137.02   27.99     0.35     0.02   
MIN              1.14    15.96     0.00     1.00    0.00     0.03     0.00   
[mysql@mysqldb tmp]$ tsar  --help
Usage: tsar [options]
Options:
    -check         display last record for alert
    --check/-C     display last record for alert.example:tsar --check / tsar --check --cpu --io
    --cron/-c      run in cron mode, output data to file
    --interval/-i  specify intervals numbers, in minutes if with --live, it is in seconds
    --list/-L      list enabled modules
    --live/-l      running print live mode, which module will print
    --file/-f      specify a filepath as input
    --ndays/-n     show the value for the past days (default: 1)
    --date/-d      show the value for the specify day(n or YYYYMMDD)
    --merge/-m     merge multiply item to one
    --detail/-D    do not conver data to K/M/G
    --spec/-s      show spec field data, tsar --cpu -s sys,util
    --item/-I      show spec item data, tsar --io -I sda
    --help/-h      help
Modules Enabled:
    --cpu               CPU share (user, system, interrupt, nice, & idle)
    --mem               Physical memory share (active, inactive, cached, free, wired)
    --swap              swap usage
    --tcp               TCP traffic     (v4)
    --udp               UDP traffic     (v4)
    --traffic           Net traffic statistics
    --io                Linux I/O performance
    --pcsw              Process (task) creation and context switch
    --partition         Disk and partition usage
    --tcpx              TCP connection data
    --load              System Run Queue and load average

直接收集汇总信息

[mysql@mysqldb tmp]$ tsar  --check -C
mysqldb	tsar	cpu:util=1.4 mem:util=16.3 tcp:retran=0.0 traffic:bytin=107.0 traffic:bytout=122.0 io:sda:util=0.0 load:load1=0.0 
[mysql@mysqldb tmp]$ tsar  --check 
mysqldb	tsar	cpu:util=1.4 mem:util=16.3 tcp:retran=0.0 traffic:bytin=107.0 traffic:bytout=122.0 io:sda:util=0.0 load:load1=0.0 

IBM will support Flash In DIMM using MCS

January 17, 2014 Architect, hardware, system, unix No comments

The next generation of IBM’s X-series servers will be able to accommodate solid-state Flash drives clipped into their DIMM memory slots, potentially improving the response times of fast-paced enterprise applications.

On Thursday, IBM unveiled the Series 6 generation of its System X x86-based servers. In addition to the novel reuse of DIMM slots, the X6 architecture will also let customers upgrade them to a new generation of processors or memory without swapping in a new motherboard.

DIMM PicULLtraDIMM

Diablo Technologies, a memory technology company, developed Memory Channel Storage (MCS) that 
enables flash on a DIMM module to be accessed by the CPU, instead of using the SATA bus as other 
DIMM form-factor SSD products have. Using a host-level driver and an ASIC on the DIMM, it creates 
a special memory storage layer in flash through which the CPU actually moves data from the RAM 
memory space. It also requires a minor modification in the server BIOS to be supported by the CPU, something 
that three OEMs have currently completed. 

Each DIMM flash module has 16 separate data channels, which are independently addressable. 
This enables parallel data writes by the driver, improving performance over the DMA process 
used by PCIe-based solutions. Specs for ULLtraDIMM are showing 5 microsecond latencies for these 
devices, an order of magnitude better than typical PCIe flash products. This architecture also 
enables up to 63 ULLtraDIMM modules to be aggregated creating 25TB of flash capacity and >9M IOPS in a single server.

MCS

Ref: IBM X series servers now pack Flash into speedy DIMM slots
IBM Beefs Up Enterprise X-Architecture With Flash, Modular Design
Heating Up Storage Performance
How to Make Flash Accessible on the Memory Bus
Memory Channel Storage™
ULLtraDIMM: combining SSD and DRAM for the enterprise

团队合照一张

December 25, 2013 system No comments

团队圣诞合照一张,缺了一个同事,也走了几个同事,往事不堪回首 – -)

IMG_0975

闲聊,闲扯

October 30, 2013 system No comments

letsdiscusspng

一个月没写blog了,不想在10月打破连续每月blog的传统,那就说一说1号店今年的IT发展吧,纯属闲扯。

首先rdbms数据库方面,继续坚持之前制定的方针,大量的oracle转向的mysql(mysql在oracle接手之后,稳定性确实得到了极大的提高,innodb的并发能力提升,以及复制端的一些增强极大的增强了mysql对于在线业务的support能力)。相信这个趋势会继续在所有行业中蔓延,在业务设计合理的情况下,确实可以使用mysql来代替oracle那昂贵的许可费用。目前对于公司的愿景来说,是希望用大量的mysql集群取代目前的核心oracle数据库(包括订单,支付等)虽然在涉及到走钱的系统上大家还是保持一种严谨的态度,但相信这是趋势。针对现在传言已久的mysql收费问题,其实个人觉得这是个没必要担心的问题,大量的开源技术充斥着这个时代,没有任何可以坚守一辈子的技术,只有适合不适合。包括目前oracle12C的推出,包括可插拔,IN-MEMORY等技术的引入,不难看出oracle对于未来RDBMS database走向的定位。不能说oracle不好,只能说oracle太好,目前能把oracle用到极致的公司你能说他们架构很垃圾么? (maybe一些人觉得不用mysql就不是互联网公司,那我只能呵呵 🙂 )

再说说nosql方面,今年公司上了大量的nosql数据库,包括数据部门的大量mongodb以及前台业务部门的大量HBASE(历史订单查询已经归于hbase管理,数据监控部门的levelDB redis等。目前公司的nosql方向主要是基于mongodb的 大量的项目也在考虑迁移mongodb…可是现在竟然没有一位专门管理mongodb的同学= = 。nosql是一个方向,但不是绝对方向.我相信nosql在很大程度上还是无法取代传统rdbms的,至少在1号店目前oracle,mysql依然是绝对主力。

再聊聊架构方面,今年DBA TEAM 推出了myawr与DGHA等自动化工具,配合MHA 形成了目前的主流数据库架构(mysql MHA集群 + oracle HA) 由于在26号公司最后一台RAC数据库下线(exadata除外,另外今年可能还要采购新的X3),目前1号店的数据库架构主要为oracle 传统HA 方案 (HA定制+传统HA软件),mysql mha + transfer + db recover (mysql to hbase) 等。 随着mysql的规模日渐增加,需要更加重视系统级别的HA方案。在IO方面,继续去年的SSD PCIe化,越来越多的database将被迁移到SSD+PCIe的机器上,当IO,MEMORY,CPU等都不再是瓶颈的时候,将会考虑更好的利用机器的性能,更加细化的切分业务,建立更完善的高可用方案以及容灾方案等。明年也许公司的最后一台小型机即将下线,彻底的告别了unix,大踏步的迈向X86架构新时代(say goodbye unix, say goodbye IBM ~)
在基础架构方面,公司也推出了一系列配合后端dababase的平台包括contab大任务平台,detector平台,hedwig SOA平台等。

最后不聊技术,聊一些题外话,今年公司的销售额将破百亿,又是一个里程碑。但是在如此竞争的电商界,百亿的销售额貌似有点拿不出手? 一直觉得最后国内的电商只会剩下不超过5家,何去何从,也许傍着沃尔玛这颗大树真的很好乘凉,但是也确实要好好想想了。(ps 今年比较牛逼的 唯品会,易迅),另外极端需要吐槽下公司的这几次广告….第一次的恶男恶女+第二次的馋嘴宝宝(刘XX? – -)…严重怀疑广告部门的审美能力。

另外继续招mysql,oracle,架构方面的人才,有兴趣的简历ylouis83@gmail.com

How to use Flash Cache on Redhat (not OEL)

August 26, 2013 Architect, hardware, linux, system No comments

QQ图片20130826223025

By ask Surachart for help

Test: Flash Cache on 11gR2 + RHEL

A Flash Cache (11gR2) is supported by OEL or Solaris. If Want To use RHEL(Example: RHEL 5.3)

Patched 8974084 before

SQL> startup
ORA-00439: feature not enabled: Server Flash Cache
ORA-01078: failure in processing system parameters

TEST: ***use "strace" commnad to trace system & signals***
$ strace -o /tmp/file01.txt -f sqlplus '/ as sysdba' <<EOF
startup
EOF

Find 2 points:
1. about /etc/*-release files.

3884  open("/etc/enterprise-release", O_RDONLY) = 8
3884  read(8, "Enterprise Linux Enterprise Linu"..., 255) = 64


2. about "rpm" cammand
32278 execve("/bin/rpm", ["/bin/rpm", "-qi", "--info", "enterprise-release"], [/* 25 vars */] <unfinished ...>
Next, it greps for “66ced3de1e5e0159” from the following output…
 
try to check on Enterprise Linux.

$ rpm -qi --info "enterprise-release"

Name        : enterprise-release           Relocations: (not relocatable)
Version     : 5                                 Vendor: Oracle USA
Release     : 0.0.17                        Build Date: Wed 21 Jan 2009 06:00:33 PM PST
Install Date: Mon 11 May 2009 11:19:45 AM PDT      Build Host: ca-build10.us.oracle.com
Group       : System Environment/Base       Source RPM: enterprise-release-5-0.0.17.src.rpm
Size        : 59030                            License: GPL
Signature   : DSA/SHA1, Wed 21 Jan 2009 06:56:48 PM PST, Key ID 66ced3de1e5e0159
Summary     : Enterprise Linux release file
Description :
System release and information files
Name        : enterprise-release           Relocations: (not relocatable)
Version     : 5                                 Vendor: Oracle USA
Release     : 0.0.17                        Build Date: Wed 21 Jan 2009 06:00:33 PM PST
Install Date: Mon 11 May 2009 11:19:45 AM PDT      Build Host: ca-build10.us.oracle.com
Group       : System Environment/Base       Source RPM: enterprise-release-5-0.0.17.src.rpm
Size        : 59030                            License: GPL
Signature   : DSA/SHA1, Wed 21 Jan 2009 06:56:48 PM PST, Key ID 66ced3de1e5e0159
Summary     : Enterprise Linux release file
Description :
System release and information files


Fixed:
1. FAKE *-release file (don't forgot backup before)
- Modify /etc/redhat-release + /etc/enterprise-release files.
$ cat /etc/redhat-release
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)

$ cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)

2. FAKE rpm to check "enterprise-release" package.
- Modify /bin/rpm
#  mv /bin/rpm /bin/rpm.bin

# vi /bin/rpm
#!/bin/sh
if [ "$3" = "enterprise-release" ]
then
     echo 66ced3de1e5e0159
else
      exec /bin/rpm.bin "$*"
fi

# chmod 755 /bin/rpm

Try... Again -> startup database.

SQL> startup

linux tools part 8– Monitoring Reads and Writes to a File and device

May 13, 2013 linux, system No comments

1.配合使用systemtap 监控系统对于文件的读写:

[root@db-42 systemtap]# uname -a
Linux db-42 2.6.18-194.el5 #1 SMP Mon Mar 29 22:10:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@db-42 systemtap]# more inodewatch.stp 
#! /usr/bin/env stap

probe vfs.write, vfs.read
{
  # dev and ino are defined by vfs.write and vfs.read
  if (dev == MKDEV($1,$2) # major/minor device
      && ino == $3)
    printf ("%s(%d) %s 0x%x/%u\n",
      execname(), pid(), probefunc(), dev, ino)
}

inodewatch.stp takes the following information about the file as arguments on the command line:
The file's major device number.
The file's minor device number.
The file's inode number.

[root@db-42 ~]# stat -c '%D %i' test.dat 
803 1831440


[root@db-42 systemtap]# stap  inodewatch.stp  0x8 0x3  1831440

dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440
dd(27257) vfs_write 0x800003/1831440

[root@db-42 systemtap]# ps -ef |grep 27257
root     27257 27217 74 14:42 pts/6    00:00:14 dd if /dev/zero of /root/test.dat bs 4K count 10000000

——对于 block device的读写 如下:

[root@db-42 ~]# stat -c "0x%D" /dev/sda11
0x11
[root@db-42 ~]# fdisk -l 

Disk /dev/sda: 1798.6 GB, 1798651772928 bytes
47 heads, 36 sectors/track, 2076236 cylinders
Units = cylinders of 1692 * 512 = 866304 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         242      204714   83  Linux
/dev/sda2             243      161225   136191618   82  Linux swap / Solaris
/dev/sda3          161226      223198    52429158   83  Linux
/dev/sda4          223199     2076236  1567670148    5  Extended
/dev/sda5          223199      247987    20971476   83  Linux
/dev/sda6          247988      272776    20971476   83  Linux
/dev/sda7          272777      390980   100000566   83  Linux
/dev/sda8          390981      414068    19532430   83  Linux
/dev/sda9          414069      437156    19532430   83  Linux
/dev/sda10         437157      460244    19532430   83  Linux
/dev/sda11         460245      483332    19532430   83  Linux
[root@db-42 ~]# more traceio2.stp
#! /usr/bin/env stap

global device_of_interest

probe begin {
  /* The following is not the most efficient way to do this.
      One could directly put the result of usrdev2kerndev()
      into device_of_interest.  However, want to test out
      the other device functions */
  dev = usrdev2kerndev($1)
  device_of_interest = MKDEV(MAJOR(dev), MINOR(dev))
}

probe vfs.write, vfs.read
{
  if (dev == device_of_interest)
    printf ("%s(%d) %s 0x%x\n",
            execname(), pid(), probefunc(), dev)
}
[root@db-42 ~]# stap traceio2.stp  0x11


dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11
dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11
dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11
dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11
dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11
dd(30266) vfs_read 0x11
dd(30266) vfs_write 0x11

[oracle@db-42 ~]$ ps -ef |grep  30266
root     30266 27217 99 16:40 pts/6    00:00:32 dd if /dev/zero of /dev/sda11 bs 4K count 100000000

2.直接使用lsof

[root@db-42 ~]# lsof  /root/test.dat 
[root@db-42 ~]# lsof  /root/test.dat 
COMMAND   PID USER   FD   TYPE DEVICE      SIZE    NODE NAME
dd      30756 root    1w   REG    8,3 926494720 1831440 /root/test.dat
[root@db-42 ~]# lsof  /dev/sda
sda    sda1   sda10  sda11  sda2   sda3   sda4   sda5   sda6   sda7   sda8   sda9   
[root@db-42 ~]# lsof  /dev/sda11  
COMMAND   PID USER   FD   TYPE DEVICE SIZE    NODE NAME
dd      30761 root    1w   BLK   8,11      4548595 /dev/sda11
[root@db-42 ~]# lsof  version
lsof: status error on version: No such file or directory
lsof 4.78
 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
 latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
 latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
 usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f]
 [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]]
 [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names]
Use the ``-h'' option to get more help information.