NoSQL

Logcenter Project architect

August 16, 2017 Architect, Architecture, bigdata, hadoop, hive, network, NoSQL, rdbms No comments

We created a project called LC (log center) for ops department
All member of ops are using this system for analyzing in a lower layer.
We collects all types of log including db-system, crond, secutiry log , cmdlog , api log etc.
We used MQ system for log push which based on a policy center. And we created a new background system to search and management.

Click this project LC-system-design

TCP Trace

September 21, 2016 Architecture, MYSQL, redis, system No comments

I found a useful tool named sniffer to help us analyze network packages (this tool can capture packages on specific port)

I use sniffer to analyze mysql and redis packages (translate these packages to normal queries)

#./vc-redis-sniffer –help

vc-redis-sniffer is a utility from VividCortex to monitor query activity and write results to a file.
See --license for the terms governing your usage of this program.

  -binding="[::]:6379"         This is a list of comma separated bind strings as seen in /proc/net/tcp
  -help="false"                Show this usage message
  -license="false"             Print the usage terms of this program
  -output=""                   Filepath to output queries to. Defaults to stdout if none specified.
  -show-database="false"       Include a 'USE `database`' for every statement. Supersedes show-database-changes.
  -show-database-changes="false"
                               Include a 'USE `database`' every time the database is changed.
  -verbose="false"             Enable logging on program startup to stderr
  -version="false"             Show version and exit

  Flag                         Current value
--------------------------------------------
  -binding                     "[::]:6379"
  -help                        "true"
  -license                     "false"
  -output                      ""
  -show-database               "false"
  -show-database-changes       "false"
  -verbose                     "false"
  -version                     "false"

Capture packages and gather logs

[root@a1-dba-test-242-13 /tmp/vc-redis-sniffer]
#./vc-redis-sniffer -binding=”[::]:6379″ -output=/tmp/redis.log

Analyze logs using pt-tools

[root@a1-dba-test-242-13 /tmp/vc-mysql-sniffer]
#pt-query-digest /tmp/redis.log

redis_output_result.txt

Also we can analyze mysql online queries ,do above steps to get result:

mysql_output_result.txt

How to backup remote redis instance

January 29, 2016 Architect, NoSQL, redis No comments

We wrote a python scripts to control redis backup work.

Three Steps to do this work:

1. Create meta database to store which redis instance needs to backup

2. Use scripts to connect to these redis instances with “–rdb” command (remote backup command)

3. Estimate which redis server should be transfered (only slave role redis will be transfered, wait minutes if a bgsave is running)

Backup Scripts: Redis_remote.py

How to configure AWR system

August 6, 2015 Architect, Architecture, mongodb, MYSQL, NoSQL, rdbms, software No comments

In this article, we introduce myawr and mongoawr system .

Read this PDF, you will learn how to configure them.

How to configure AWR system.

How to configure WEBM

July 15, 2015 Architect, mongodb, MYSQL, software No comments

Architecture of WEBM system.

Reference:

http://www.vmcd.org/2014/10/webm_v2-has-been-released/
http://www.vmcd.org/2014/09/webm-mysql-database-performance-web-monitor/

View this PDF:

http://www.vmcd.org/docs/How%20to%20configure%20WEBM.pdf

webm_v2 has been released

October 21, 2014 Architect, NoSQL, rdbms, software No comments

Webm_v2 was add oracle and mongodb monitor module

webm was agent uploaded mode to save statistics

we use mysql database as monitor server

we already design tables to store these data (consider for future analysis)

you can download from github : https://github.com/ylouis83/webm

NoSQL压测工具YCSB

July 16, 2014 Architect, NoSQL, software No comments

Download PDF from slideshare

基于mongodb的压力评测工具 YCSB的一些概括

YCSB 是一款基于NOSQL Benchmark的工具,提供了多种方式模拟测试nosql的负载,基于现在对nosql的一些压力测试还没有得到重视.
YCSB 的全面的测试功能可以给即将上线的nosql提供一种另类保障。

There are many new serving databases available, including:

PNUTS
BigTable
HBase
Hypertable
Azure
Cassandra
CouchDB
Voldemort
MongoDb
OrientDB
Infinispan
Dynomite
Redis
GemFire
GigaSpaces XAP
DynamoDB
Couchhase
Aerospike 

下面的数据仅供参考,测试于虚拟机服务器。

给一个简单的load data的例子:

[root@mysqlstd ycsb-0.1.4]# ./bin/ycsb load  mongodb -P workloads/workloada   -p mongodb.url=mongodb://127.0.0.1:27017 -p mongodb.database=newdb -p mongodb.writeConcern=normal  -s  >data
Loading workload...
Starting test.
 0 sec: 0 operations; 
 10 sec: 18448 operations; 1837.08 current ops/sec; [INSERT AverageLatency(us)=423.35] 
 20 sec: 42134 operations; 2366.71 current ops/sec; [INSERT AverageLatency(us)=373.44] 
 30 sec: 61185 operations; 1904.34 current ops/sec; [INSERT AverageLatency(us)=661.58] 
 40 sec: 85308 operations; 2411.09 current ops/sec; [INSERT AverageLatency(us)=324.83] 
 50 sec: 97785 operations; 1247.2 current ops/sec; [INSERT AverageLatency(us)=985.33] 
 50 sec: 100000 operations; 2662.26 current ops/sec; [INSERT AverageLatency(us)=371.24] 

load data之后 可以开始模拟压测了.YCSB 主要分为以下几种模式:

Workload A: Update heavy workload

This workload has a mix of 50/50 reads and writes. An application example is a session store recording recent actions.

Workload B: Read mostly workload

This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags.

Workload C: Read only

This workload is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop).

Workload D: Read latest workload

In this workload, new records are inserted, and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest.

Workload E: Short ranges

In this workload, short ranges of records are queried, instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id).

Workload F: Read-modify-write
In this workload, the client will read a record, modify it, and write back the changes. Application example: user database, where user records are read and modified by the user or to record user activity.

其中E模式的short range存在问题,当然我们也可以自定义模式:

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#                        
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian

recordcount=100000
operationcount=100000
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0

修改read update insert的百分比来达到模拟真实环境的目的, 下面给个压测例子

比如压测纯读操作:

[root@mysqlstd ycsb-0.1.4]# ./bin/ycsb run  mongodb -P workloads/workloadc  -P large.dat -p mongodb.url=mongodb://127.0.0.1:27017 -p mongodb.database=newdb  -p mongodb.writeConcern=normal  -s  >data
Loading workload...
Starting test.
 0 sec: 0 operations; 
 10 sec: 49375 operations; 4922.24 current ops/sec; [READ AverageLatency(us)=192.56] 
 18 sec: 100000 operations; 6141.57 current ops/sec; [READ AverageLatency(us)=159.72] 

95%读+%5写:

[root@mysqlstd ycsb-0.1.4]# ./bin/ycsb run  mongodb -P workloads/workloadd  -P large.dat -p mongodb.url=mongodb://127.0.0.1:27017 -p mongodb.database=newdb  -p mongodb.writeConcern=normal  -s  >data
Loading workload...
Starting test.
 0 sec: 0 operations; 
 10 sec: 43497 operations; 4333.23 current ops/sec; [INSERT AverageLatency(us)=633.66] [READ AverageLatency(us)=196.33] 
 20 sec: 92795 operations; 4925.37 current ops/sec; [INSERT AverageLatency(us)=792.15] [READ AverageLatency(us)=167.74] 
 21 sec: 100000 operations; 5637.72 current ops/sec; [INSERT AverageLatency(us)=379.57] [READ 

AverageLatency(us)=163.45]

另外 thumbtack对YCSB 进行了修改,增强了一些功能 开源地址:

https://github.com/thumbtack-technology/ycsb

主要增加支持了 Aerospike and Couchbase 目前 Aerospike已经开源 针对SSD 进行了专门优化 :

http://www.aerospike.com/blog/entrepreneurs-break-all-the-rules-aerospike-goes-open-source/

同时MongoDB 的驱动程序从 2.8.0 版(在 Mongo 2.2 发布之前的版本)升级到了 2.10.1 版,并实现了对 readPreference 配置的设置

mongodb.readPreference = primary|primaryPreferred|secondary|secondaryPreferred

QQ图片20140715114647

下面结合 thumbtack修改的版本做个测试 可以得到具体数据:

[root@mysqlstd ycsb]# fab ycsb_load:db=mongodb

[10.0.32.38] Executing task 'ycsb_load'
2014-07-15 01:09:00-07:00
[10.0.32.38] run: echo "/root/ycsb/bin/ycsb load mongodb -s -p mongodb.url=mongodb://127.0.0.1:27017 -p workload=com.yahoo.ycsb.workloads.CoreWorkload -p updateretrycount=1000 -p mongodb.writeConcern=normal -p mongodb.database=ycsb -p recordcount=5000000 -p exportmeasurementsinterval=30000 -p fieldcount=10 -p timeseries.granularity=100 -p threadcount=32 -p insertretrycount=10 -p readretrycount=1000 -p ignoreinserterrors=true -p reconnectionthroughput=10 -p operationcount=2400000000 -p fieldnameprefix=f -p maxexecutiontime=2400 -p mongodb.readPreference=primaryPreferred -p measurementtype=timeseries -p reconnectiontime=1000 -p fieldlength=10 -p insertstart=0 -p insertcount=5000000 > /root/ycsb/2014-07-15_01-09_mongodb_load.out 2> /root/ycsb/2014-07-15_01-09_mongodb_load.err" | at 01:09 today
[10.0.32.38] out: job 13 at 2014-07-15 01:09

Done.
Disconnecting from 10.0.32.38... done.

[mongo@mysqlstd ~]$ /data/mongodb/mongodb/bin/mongo

MongoDB shell version: 2.6.1
connecting to: test
> show dbs
admin   0.031GB
local   0.031GB
newdb   0.500GB
newdb1  0.500GB
newdb2  0.500GB
ycsb    1.500GB
> use ycsb
switched to db ycsb
> db.usertable.c
db.usertable.clean(                  db.usertable.convertToCapped(        db.usertable.copyTo(                 db.usertable.createIndex(
db.usertable.constructor             db.usertable.convertToSingleObject(  db.usertable.count(
> db.usertable.count()
2675710

模拟压力环境:workload mode=A

[root@mysqlstd ycsb]# fab ycsb_run:db=mongodb,workload=A
[10.0.32.38] Executing task 'ycsb_run'
2014-07-15 02:13:00-07:00
[10.0.32.38] run: echo "/root/ycsb/bin/ycsb run mongodb -s -P /root/ycsb/workloads/workloada -p mongodb.url=mongodb://127.0.0.1:27017 -p workload=com.yahoo.ycsb.workloads.CoreWorkload -p updateretrycount=1000 -p mongodb.writeConcern=normal -p mongodb.database=ycsb -p recordcount=5000000 -p exportmeasurementsinterval=30000 -p fieldcount=10 -p timeseries.granularity=100 -p threadcount=32 -p insertretrycount=10 -p readretrycount=1000 -p ignoreinserterrors=true -p reconnectionthroughput=10 -p operationcount=1800000 -p fieldnameprefix=f -p maxexecutiontime=180 -p mongodb.readPreference=primaryPreferred -p measurementtype=timeseries -p reconnectiontime=1000 -p fieldlength=10 > /root/ycsb/2014-07-15_02-13_mongodb_workloada.out 2> /root/ycsb/2014-07-15_02-13_mongodb_workloada.err" | at 02:13 today
[10.0.32.38] out: job 23 at 2014-07-15 02:13
[10.0.32.38] out: 

Done.
Disconnecting from 10.0.32.38... done.

使用merge.py得到具体数值

[root@mysqlstd ycsb]#  ./bin/merge.py																													
	OVERALL	OVERALL	READ	READ	READ	READ	READ	READ	READ	READ	READ	UPDATE	UPDATE	UPDATE	UPDATE	UPDATE	UPDATE	UPDATE	UPDATE	UPDATE	CLEANUP	CLEANUP	CLEANUCLEANUP	CLEANUP	CLEANUP	CLEANUP	CLEANUP	CLEANUP	
	RunTime	Throughput	Operations	Retries	Return=0	Return=[^0].*	AverageLatency	MinLatency	MaxLatency	95thPercentileLatency	99thPercentileLatency	Operations	Retries	Return=0	Return=[^0].*	AverageLatency	MinLatency	MaxLatency	95thPercentileLatency	99thPercentileLatency	Operations	Retries	Return=0	Return=[^0].*	AverageLatency	MinLatency	MaxLatency	95thPercentileLatency	99thPercentileLatency
1	61156	28.58264111	1665	747000	918	747	927.7820691	0.132	6630.776			83	33000	50	33	2075.808675	0.504	9767.828			32	0			18.08734375	0.465	207.159		
Total	61156	28.58264111	1665	747000	918	747	927.7820691	0.132	6630.776			83	33000	50	33	2075.808675	0.504	9767.828			32	0	0	0	18.08734375	0.465	207.159		

[mongo@mysqlstd ~]$ /data/mongodb/mongodb/bin/mongostat
connected to: 127.0.0.1

insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
    *0   5568   5369     *0       0  5381|0       0  3.06g  6.47g   282m      9 admin:0.9%          0       0|0     0|0     1m   894k    95   02:14:05 
    *0   4298   6267     *0       0  6279|0       0  3.06g  6.47g   282m      6 admin:0.6%          0       0|0     1|0     1m   962k    96   02:14:06 
    *0   4675   6119     *0       0  6066|0       0  3.06g  6.47g   282m      2 admin:0.0%          0      95|0     1|0     1m   948k    92   02:14:07 
    *0   4137   4866     *0       0  4948|0       0  3.06g  6.47g   282m     18 admin:2.1%          0       0|0     0|0     1m   790k    91   02:14:08 
    *0   4568   5904     *0       0  5922|0       0  3.06g  6.47g   282m      4 admin:0.1%          0       0|0     0|0     1m   927k    92   02:14:09 
    *0   4727   6034     *0       0  6046|0       0  3.06g  6.47g   282m      5 admin:0.0%          0       0|0     0|0     1m   949k    90   02:14:10 
    *0   4991   5673     *0       0  5690|0       0  3.06g  6.47g   282m      3 admin:0.9%          0       0|0     0|0     1m   914k    94   02:14:11 
    *0   4740   5173     *0       0  5183|0       1  3.06g  6.47g   282m      7 admin:0.1%          0       0|0     0|0     1m   839k    94   02:14:12 
    *0   4332   5493     *0       0  5510|0       0  3.06g  6.47g   282m      8 admin:0.9%          0       0|0     0|0     1m   866k    94   02:14:13 
    *0   4980   5583     *0       0  5592|0       0  3.06g  6.47g   282m      8 admin:0.0%          0       0|0     0|0     1m   901k    97   02:14:14 
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
    *0   5750   5030     *0       0  4997|0       0  3.06g  6.47g   282m     20 admin:1.8%          0      94|0     1|1     1m   853k    97   02:14:15 
    *0   4884   5509     *0       0  5578|0       0  3.06g  6.47g   282m     10 admin:0.1%          0       0|0     0|0     1m   894k    97   02:14:16 
    *0   5733   5773     *0       0  5784|0       0  3.06g  6.47g   282m      5 admin:0.0%          0       0|0     0|0     1m   952k    92   02:14:17 
    *0   5178   5202     *0       0  5219|0       0  3.06g  6.47g   282m     14 admin:0.0%          0       0|0     0|0     1m   861k    95   02:14:18 
    *0   4179   5680     *0       0  5688|0       0  3.06g  6.47g   282m      8 admin:0.0%          0       0|0     0|1     1m   884k    93   02:14:19 
    *0   4879   5695     *0       0  5707|0       0  3.06g  6.47g   282m     11 admin:0.1%          0       0|0     0|0     1m   911k    93   02:14:20 
    *0   5271   5402     *0       0  5413|0       0  3.06g  6.47g   282m     12 admin:0.0%          0       0|0     0|0     1m   887k    95   02:14:21 
    *0   4583   4852     *0       0  4867|0       1  3.06g  6.47g   282m     11 admin:0.0%          0       0|0     0|0     1m   795k    93   02:14:22 
    *0   6654   4956     *0       0  4967|0       0  3.06g  6.47g   282m     10 admin:1.5%          0       0|0     0|0     1m   881k    95   02:14:23 
    

REF:
http://www.aerospike.com/blog/entrepreneurs-break-all-the-rules-aerospike-goes-open-source/

https://github.com/thumbtack-technology/ycsb

http://www.aerospike.com/wp-content/uploads/2013/02/Ultra-High-Performance-NoSQL-Benchmarking_zh-CN.pdf

https://github.com/brianfrankcooper/YCSB/wiki

http://labs.yahoo.com/news/yahoo-cloud-serving-benchmark/

MongoDB not preallocate journal log

June 19, 2014 mongodb, NoSQL No comments

在安装mongodb 2.6.2的时候发现一个奇怪的问题 RS 节点的journal log并没有提前分配.我们知道在安装mongodb的时候
mongo总是会预先分配journal log 启用smallfile的时候默认为128MB 否则会分配1GB的journal log

下面是仲裁节点的日志:

2014-06-17T11:50:09.842+0800 [initandlisten] MongoDB starting : pid=4749 port=27017 dbpath=/data/mongodb/data 64-bit host=vm-3-57
2014-06-17T11:50:09.844+0800 [initandlisten] db version v2.6.2
2014-06-17T11:50:09.844+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T11:50:09.844+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T11:50:09.844+0800 [initandlisten] allocator: tcmalloc
2014-06-17T11:50:09.844+0800 [initandlisten] options: { config: "/data/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/data/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/data/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/data/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/data/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/data/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T11:50:09.863+0800 [initandlisten] journal dir=/data/mongodb/data/journal
2014-06-17T11:50:09.864+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9
2014-06-17T11:50:11.662+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.0
2014-06-17T11:50:14.009+0800 [initandlisten]        File Preallocator Progress: 629145600/1073741824    58%
2014-06-17T11:50:26.266+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.1
2014-06-17T11:50:29.009+0800 [initandlisten]        File Preallocator Progress: 723517440/1073741824    67%
2014-06-17T11:50:40.751+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.2
2014-06-17T11:50:43.020+0800 [initandlisten]        File Preallocator Progress: 597688320/1073741824    55%
2014-06-17T11:50:55.830+0800 [FileAllocator] allocating new datafile /data/mongodb/data/local/local.ns, filling with zeroes...

mongo默认创建了3个1GB 的journal log

再来看下RS 节点的日志:

2014-06-17T14:31:31.095+0800 [initandlisten] MongoDB starting : pid=8630 port=27017 dbpath=/storage/sas/mongodb/data 64-bit host=db-mysql-common01a
2014-06-17T14:31:31.096+0800 [initandlisten] db version v2.6.2
2014-06-17T14:31:31.096+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T14:31:31.096+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T14:31:31.096+0800 [initandlisten] allocator: tcmalloc
2014-06-17T14:31:31.096+0800 [initandlisten] options: { config: "/storage/sas/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/storage/sas/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/storage/sas/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/storage/sas/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/storage/sas/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/storage/sas/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/storage/sas/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T14:31:31.101+0800 [initandlisten] journal dir=/storage/sas/mongodb/data/journal
2014-06-17T14:31:31.102+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T14:31:31.130+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.ns, filling with zeroes...
2014-06-17T14:31:31.130+0800 [FileAllocator] creating directory /storage/sas/mongodb/data/local/_tmp
2014-06-17T14:31:31.132+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.ns, size: 16MB,  took 0 secs
2014-06-17T14:31:31.137+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.0, filling with zeroes...
2014-06-17T14:31:31.138+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.0, size: 64MB,  took 0 secs
2014-06-17T14:31:31.141+0800 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }

没有创建journal log 直接创建了datafile 这个问题很奇怪 之前认为是ext4的问题,咨询朋友之后发现mongodb在创建journal log 之前会有一个判断,部分源码如下:

// @file dur_journal.cpp writing to the writeahead logging journal

  bool _preallocateIsFaster() {
            bool faster = false;
            boost::filesystem::path p = getJournalDir() / "tempLatencyTest";
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            try {
                AlignedBuilder b(8192);
                int millis[2];
                const int N = 50;
                for( int pass = 0; pass < 2; pass++ ) {
                    LogFile f(p.string());
                    Timer t;
                    for( int i = 0 ; i < N; i++ ) { 
                        f.synchronousAppend(b.buf(), 8192);
                    }
                    millis[pass] = t.millis();
                    // second time through, file exists and is prealloc case
                }
                int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }
            }
            catch (const std::exception& e) {
                log() << "info preallocateIsFaster couldn't run due to: " << e.what()
                      << "; returning false" << endl;
            }
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            return faster;
        }
        bool preallocateIsFaster() {
            Timer t;
            bool res = false;
            if( _preallocateIsFaster() && _preallocateIsFaster() ) { 
                // maybe system is just super busy at the moment? sleep a second to let it calm down.  
                // deciding to to prealloc is a medium big decision:
                sleepsecs(1);
                res = _preallocateIsFaster();
            }
            if( t.millis() > 3000 ) 
                log() << "preallocateIsFaster check took " << t.millis()/1000.0 << " secs" << endl;
            return res;
        }
        
 int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }   

如果diff> 2*N 那么mongo 将会认为 preallocate 是更好的选择,将会预先分配log ,这在仲裁节点日志也体现出来了,都大于2ms

2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9

不过个人认为这个设计很无聊,我相信没人会介意初始化日志的那么点时间吧,何况如果出现峰值的之后再去分配log,对IO又是一个冲击。

How to change oplog size — mongo

September 10, 2013 mongodb, NoSQL No comments

主要两种方法 :
1.轮流改变oplog size (from primary to secondary)

2.重新初始化 secondary 定制oplog size 切换原来的primary

具体操作一下方法一的步骤,集体参考mongo oplog

1). 切换当前primary ->secondary

rs1:PRIMARY> rs.stepDown();

2). 关闭MongoDB

rs1:SECONDARY> db.shutdownServer();

3). 注释掉replSet选项,以单机模式启动 —切换port
4). 查询出最后的同步点

> use local
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next();
{ "ts" : Timestamp(1378716098, 2), "h" : NumberLong("-654971153597320397") }

5). 删除旧的oplog

> db.oplog.rs.drop();

6). 创建新的oplog,这里为30GB

> db.runCommand({create:"oplog.rs", capped:true, size:(30*1024*1024*1024)});

7). 写入最后的同步点

> db.oplog.rs.save({ "ts" : Timestamp(1378716098, 2), "h" : NumberLong("-654971153597320397") });

8). 关闭MongoDB

> db.shutdownServer();

9). 使用replSet选项,以Replset模式启动
10). 检查同步情况

MySQL key partition and MongoDB TEST

March 8, 2013 mongodb, MYSQL, NoSQL, performance No comments

对于业务的激活码需求做了一次关于mysql,mongodb的比对.mysql分为normal,key partition 数量分别是1亿和10亿数据,mysql采用直接访问PK键,partition key为PK,mysql table size 为90G,mongodb table size为157G。

[liuyang@yhdem ~]$ cat /proc/cpuinfo  |grep processor |wc -l
24

[liuyang@yhdem ~]$ cat /etc/issue
Oracle Linux Server release 5.8
Kernel \r on an \m

mysql evn:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.5.25a   | 
+-----------+
1 row in set (0.00 sec)
      
      log_bin[OFF] innodb_flush_log_at_trx_commit [2]  query_cache_type[OFF]
      max_connect_errors[10] max_connections[214] max_user_connections[0] 
      sync_binlog[0] table_definition_cache[400] 
      table_open_cache[400] thread_cache_size[8]  open_files_limit[30000]
      innodb_adaptive_flushing[ON] innodb_adaptive_hash_index[ON] innodb_buffer_pool_size[30.234375G] 
      innodb_file_per_table[ON] innodb_flush_log_at_trx_commit[2] innodb_flush_method[] 
      innodb_io_capacity[200] innodb_lock_wait_timeout[100] innodb_log_buffer_size[128M] 
      innodb_log_file_size[200M] innodb_log_files_in_group[2] innodb_max_dirty_pages_pct[75] 
      innodb_open_files[1600] innodb_read_io_threads[4] innodb_thread_concurrency[0] 
      innodb_write_io_threads[4]

以下图片均为QPS统计,TPS测试暂时没有做

no partition table with one billion rows –> small random select by pk

mysql_test_1

xDiskName Busy  Read WriteKB|0          |25         |50          |75	   100|                                                                        
xsda        1%    2.0   35.9|>                                                |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0|>                                                |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       1%    2.0   35.9|>                                                |                                                                      
xsdb        0%    0.0   55.9|>                                                |                                                                      
xsdb1       0%    0.0   55.9|>                                                |                                                                      
xTotals Read-MB/s=0.0      Writes-MB/s=0.2      Transfers/sec=18.0 

partition table with one billion rows –> small random select by pk

mysql_test_2

xDiskName Busy  Read WriteKB|0          |25         |50          |75	   100|                                                                       
xsda        0%    0.0    8.0|>                                                |                                                                     
xsda1       0%    0.0    0.0|>                                                |                                                                     
xsda2       0%    0.0    8.0|>                                                |                                                                     
xsda3       0%    0.0    0.0|>                                                |                                                                     
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                     
xsda5       0%    0.0    0.0|>                                                |                                                                     
xsda6       0%    0.0    0.0|>                                                |                                                                     
xsdb        0%    0.0  201.5|                         >                       |                                                                     
xsdb1       0%    0.0  201.5|W                        >                       |                                                                     
xTotals Read-MB/s=0.0      Writes-MB/s=0.4      Transfers/sec=46.9             

no partition table with one billion rows –> full random select by pk

mysql_test_3

xDiskName Busy  Read WriteMB|0          |25         |50          |75	   100|                                                                        
xsda        0%    0.0    0.0| >                                               |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0|>                                                |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       0%    0.0    0.0| >                                               |                                                                      
xsdb      100%   86.8    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xsdb1     100%   86.8    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xTotals Read-MB/s=173.6    Writes-MB/s=0.4      Transfers/sec=6448.1    

partition table with one billion rows –> full random select by pk

mysql_test_4

xDiskName Busy  Read WriteMB|0          |25         |50          |75	   100|                                                                        
xsda        0%    0.0    0.0| >                                               |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0| >                                               |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       0%    0.0    0.0| >                                               |                                                                      
xsdb      100%   89.6    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xsdb1     100%   89.6    0.2|                                                 >                                                                      
xTotals Read-MB/s=179.2    Writes-MB/s=0.3      Transfers/sec=6539.3        

no partition table with 100 million rows –> full random select by pk

mysql_test_5

下面基于mongodb的TEST.同样为10亿的表,157G.

[root@db-13 tmp]# mongo
MongoDB shell version: 2.0.8
connecting to: test
> db.foo.totalSize();
157875838416
> db.foo.find().count();
1000000000

——

第一次 使用128G 满额内存 16thread,10亿random query:

[root@db-13 tmp]# mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 16      queries/sec: 126151.69666666667

第二次 使用128G 内存 24 thread,10亿中的前1亿数据random query:

[root@db-13 tmp]# mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24      queries/sec: 166527.42333333334

第三次 使用mysql用户启动mongo 限制mysql用户的mem为24G 24 thread , 10亿中的前1亿数据random query :

[mysql@db-13 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1052672
max locked memory       (kbytes, -l) 26055452
max memory size         (kbytes, -m) 26055452
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[mysql@db-13 tmp]$ mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24	 queries/sec: 161358.03333333333

第四次 使用mysql用户启动mongo 限制mysql用户的mem为24G 24 thread , 10亿random query :

[mysql@db-13 tmp]$ mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24	 queries/sec: 2549.2 ----------------------> 这里出现了物理IO读写

—提供查询脚本

ops = [{op: "findOne", ns: "test.foo", query: {_id : { "#RAND_INT" : [ 1 , 100000000 ] } }}]       
x=24
 {
    res = benchRun( {
       parallel : x ,
        seconds : 60 ,
        ops : ops
    } );
    print( "threads: " + x + "\t queries/sec: " + res.query );
}

10亿 normal table 对于1亿 normal table 在内存基于PK的访问没有衰减,10亿的partition table 对于 10亿的 normal table 在内存中衰减了2/3,10亿的partition table对于10亿的 normal table 在full table out of memory 的情况下 性能有所提升 (另外注意激活码基本只会被访问1次)

对于mongodb来说,这种业务需求完全可以搞定,在内存充足的情况下QPS达到了16W+/s,但是在内存不足的情况下,暴跌至2549.