mongodb

How to configure AWR system

August 6, 2015 Architect, Architecture, mongodb, MYSQL, NoSQL, rdbms, software No comments

In this article, we introduce myawr and mongoawr system .

Read this PDF, you will learn how to configure them.

How to configure AWR system.

How to configure WEBM

July 15, 2015 Architect, mongodb, MYSQL, software No comments

Architecture of WEBM system.

Reference:

http://www.vmcd.org/2014/10/webm_v2-has-been-released/
http://www.vmcd.org/2014/09/webm-mysql-database-performance-web-monitor/

View this PDF:

http://www.vmcd.org/docs/How%20to%20configure%20WEBM.pdf

MongoDB not preallocate journal log

June 19, 2014 mongodb, NoSQL No comments

在安装mongodb 2.6.2的时候发现一个奇怪的问题 RS 节点的journal log并没有提前分配.我们知道在安装mongodb的时候
mongo总是会预先分配journal log 启用smallfile的时候默认为128MB 否则会分配1GB的journal log

下面是仲裁节点的日志:

2014-06-17T11:50:09.842+0800 [initandlisten] MongoDB starting : pid=4749 port=27017 dbpath=/data/mongodb/data 64-bit host=vm-3-57
2014-06-17T11:50:09.844+0800 [initandlisten] db version v2.6.2
2014-06-17T11:50:09.844+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T11:50:09.844+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T11:50:09.844+0800 [initandlisten] allocator: tcmalloc
2014-06-17T11:50:09.844+0800 [initandlisten] options: { config: "/data/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/data/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/data/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/data/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/data/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/data/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/data/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T11:50:09.863+0800 [initandlisten] journal dir=/data/mongodb/data/journal
2014-06-17T11:50:09.864+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9
2014-06-17T11:50:11.662+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.0
2014-06-17T11:50:14.009+0800 [initandlisten]        File Preallocator Progress: 629145600/1073741824    58%
2014-06-17T11:50:26.266+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.1
2014-06-17T11:50:29.009+0800 [initandlisten]        File Preallocator Progress: 723517440/1073741824    67%
2014-06-17T11:50:40.751+0800 [initandlisten] preallocating a journal file /data/mongodb/data/journal/prealloc.2
2014-06-17T11:50:43.020+0800 [initandlisten]        File Preallocator Progress: 597688320/1073741824    55%
2014-06-17T11:50:55.830+0800 [FileAllocator] allocating new datafile /data/mongodb/data/local/local.ns, filling with zeroes...

mongo默认创建了3个1GB 的journal log

再来看下RS 节点的日志:

2014-06-17T14:31:31.095+0800 [initandlisten] MongoDB starting : pid=8630 port=27017 dbpath=/storage/sas/mongodb/data 64-bit host=db-mysql-common01a
2014-06-17T14:31:31.096+0800 [initandlisten] db version v2.6.2
2014-06-17T14:31:31.096+0800 [initandlisten] git version: 4d06e27876697d67348a397955b46dabb8443827
2014-06-17T14:31:31.096+0800 [initandlisten] build info: Linux build10.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-06-17T14:31:31.096+0800 [initandlisten] allocator: tcmalloc
2014-06-17T14:31:31.096+0800 [initandlisten] options: { config: "/storage/sas/mongodb/mongod.cnf", net: { http: { enabled: false }, maxIncomingConnections: 5000, port: 27017, unixDomainSocket: { pathPrefix: "/storage/sas/mongodb/data" } }, operationProfiling: { mode: "slowOp", slowOpThresholdMs: 500 }, processManagement: { fork: true, pidFilePath: "/storage/sas/mongodb/data/mongod.pid" }, replication: { replSet: "rs1" }, security: { authorization: "enabled", keyFile: "/storage/sas/mongodb/data/rs1.keyfile" }, storage: { dbPath: "/storage/sas/mongodb/data", directoryPerDB: true, journal: { enabled: true }, repairPath: "/storage/sas/mongodb/data", syncPeriodSecs: 10.0 }, systemLog: { destination: "file", path: "/storage/sas/mongodb/log/mongod_data.log", quiet: true } }
2014-06-17T14:31:31.101+0800 [initandlisten] journal dir=/storage/sas/mongodb/data/journal
2014-06-17T14:31:31.102+0800 [initandlisten] recover : no journal files present, no recovery needed
2014-06-17T14:31:31.130+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.ns, filling with zeroes...
2014-06-17T14:31:31.130+0800 [FileAllocator] creating directory /storage/sas/mongodb/data/local/_tmp
2014-06-17T14:31:31.132+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.ns, size: 16MB,  took 0 secs
2014-06-17T14:31:31.137+0800 [FileAllocator] allocating new datafile /storage/sas/mongodb/data/local/local.0, filling with zeroes...
2014-06-17T14:31:31.138+0800 [FileAllocator] done allocating datafile /storage/sas/mongodb/data/local/local.0, size: 64MB,  took 0 secs
2014-06-17T14:31:31.141+0800 [initandlisten] build index on: local.startup_log properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.startup_log" }

没有创建journal log 直接创建了datafile 这个问题很奇怪 之前认为是ext4的问题,咨询朋友之后发现mongodb在创建journal log 之前会有一个判断,部分源码如下:

// @file dur_journal.cpp writing to the writeahead logging journal

  bool _preallocateIsFaster() {
            bool faster = false;
            boost::filesystem::path p = getJournalDir() / "tempLatencyTest";
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            try {
                AlignedBuilder b(8192);
                int millis[2];
                const int N = 50;
                for( int pass = 0; pass < 2; pass++ ) {
                    LogFile f(p.string());
                    Timer t;
                    for( int i = 0 ; i < N; i++ ) { 
                        f.synchronousAppend(b.buf(), 8192);
                    }
                    millis[pass] = t.millis();
                    // second time through, file exists and is prealloc case
                }
                int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }
            }
            catch (const std::exception& e) {
                log() << "info preallocateIsFaster couldn't run due to: " << e.what()
                      << "; returning false" << endl;
            }
            if (boost::filesystem::exists(p)) {
                try {
                    remove(p);
                }
                catch(const std::exception& e) {
                    log() << "Unable to remove temporary file due to: " << e.what() << endl;
                }
            }
            return faster;
        }
        bool preallocateIsFaster() {
            Timer t;
            bool res = false;
            if( _preallocateIsFaster() && _preallocateIsFaster() ) { 
                // maybe system is just super busy at the moment? sleep a second to let it calm down.  
                // deciding to to prealloc is a medium big decision:
                sleepsecs(1);
                res = _preallocateIsFaster();
            }
            if( t.millis() > 3000 ) 
                log() << "preallocateIsFaster check took " << t.millis()/1000.0 << " secs" << endl;
            return res;
        }
        
 int diff = millis[0] - millis[1];
                if( diff > 2 * N ) {
                    // at least 2ms faster for prealloc case?
                    faster = true;
                    log() << "preallocateIsFaster=true " << diff / (1.0*N) << endl;
                }   

如果diff> 2*N 那么mongo 将会认为 preallocate 是更好的选择,将会预先分配log ,这在仲裁节点日志也体现出来了,都大于2ms

2014-06-17T11:50:10.147+0800 [initandlisten] preallocateIsFaster=true 3.52
2014-06-17T11:50:10.378+0800 [initandlisten] preallocateIsFaster=true 3.4
2014-06-17T11:50:11.662+0800 [initandlisten] preallocateIsFaster=true 2.9

不过个人认为这个设计很无聊,我相信没人会介意初始化日志的那么点时间吧,何况如果出现峰值的之后再去分配log,对IO又是一个冲击。

How to change oplog size — mongo

September 10, 2013 mongodb, NoSQL No comments

主要两种方法 :
1.轮流改变oplog size (from primary to secondary)

2.重新初始化 secondary 定制oplog size 切换原来的primary

具体操作一下方法一的步骤,集体参考mongo oplog

1). 切换当前primary ->secondary

rs1:PRIMARY> rs.stepDown();

2). 关闭MongoDB

rs1:SECONDARY> db.shutdownServer();

3). 注释掉replSet选项,以单机模式启动 —切换port
4). 查询出最后的同步点

> use local
> db.oplog.rs.find( { }, { ts: 1, h: 1 } ).sort( {$natural : -1} ).limit(1).next();
{ "ts" : Timestamp(1378716098, 2), "h" : NumberLong("-654971153597320397") }

5). 删除旧的oplog

> db.oplog.rs.drop();

6). 创建新的oplog,这里为30GB

> db.runCommand({create:"oplog.rs", capped:true, size:(30*1024*1024*1024)});

7). 写入最后的同步点

> db.oplog.rs.save({ "ts" : Timestamp(1378716098, 2), "h" : NumberLong("-654971153597320397") });

8). 关闭MongoDB

> db.shutdownServer();

9). 使用replSet选项,以Replset模式启动
10). 检查同步情况

MySQL key partition and MongoDB TEST

March 8, 2013 mongodb, MYSQL, NoSQL, performance No comments

对于业务的激活码需求做了一次关于mysql,mongodb的比对.mysql分为normal,key partition 数量分别是1亿和10亿数据,mysql采用直接访问PK键,partition key为PK,mysql table size 为90G,mongodb table size为157G。

[liuyang@yhdem ~]$ cat /proc/cpuinfo  |grep processor |wc -l
24

[liuyang@yhdem ~]$ cat /etc/issue
Oracle Linux Server release 5.8
Kernel \r on an \m

mysql evn:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.5.25a   | 
+-----------+
1 row in set (0.00 sec)
      
      log_bin[OFF] innodb_flush_log_at_trx_commit [2]  query_cache_type[OFF]
      max_connect_errors[10] max_connections[214] max_user_connections[0] 
      sync_binlog[0] table_definition_cache[400] 
      table_open_cache[400] thread_cache_size[8]  open_files_limit[30000]
      innodb_adaptive_flushing[ON] innodb_adaptive_hash_index[ON] innodb_buffer_pool_size[30.234375G] 
      innodb_file_per_table[ON] innodb_flush_log_at_trx_commit[2] innodb_flush_method[] 
      innodb_io_capacity[200] innodb_lock_wait_timeout[100] innodb_log_buffer_size[128M] 
      innodb_log_file_size[200M] innodb_log_files_in_group[2] innodb_max_dirty_pages_pct[75] 
      innodb_open_files[1600] innodb_read_io_threads[4] innodb_thread_concurrency[0] 
      innodb_write_io_threads[4]

以下图片均为QPS统计,TPS测试暂时没有做

no partition table with one billion rows –> small random select by pk

mysql_test_1

xDiskName Busy  Read WriteKB|0          |25         |50          |75	   100|                                                                        
xsda        1%    2.0   35.9|>                                                |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0|>                                                |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       1%    2.0   35.9|>                                                |                                                                      
xsdb        0%    0.0   55.9|>                                                |                                                                      
xsdb1       0%    0.0   55.9|>                                                |                                                                      
xTotals Read-MB/s=0.0      Writes-MB/s=0.2      Transfers/sec=18.0 

partition table with one billion rows –> small random select by pk

mysql_test_2

xDiskName Busy  Read WriteKB|0          |25         |50          |75	   100|                                                                       
xsda        0%    0.0    8.0|>                                                |                                                                     
xsda1       0%    0.0    0.0|>                                                |                                                                     
xsda2       0%    0.0    8.0|>                                                |                                                                     
xsda3       0%    0.0    0.0|>                                                |                                                                     
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                     
xsda5       0%    0.0    0.0|>                                                |                                                                     
xsda6       0%    0.0    0.0|>                                                |                                                                     
xsdb        0%    0.0  201.5|                         >                       |                                                                     
xsdb1       0%    0.0  201.5|W                        >                       |                                                                     
xTotals Read-MB/s=0.0      Writes-MB/s=0.4      Transfers/sec=46.9             

no partition table with one billion rows –> full random select by pk

mysql_test_3

xDiskName Busy  Read WriteMB|0          |25         |50          |75	   100|                                                                        
xsda        0%    0.0    0.0| >                                               |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0|>                                                |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       0%    0.0    0.0| >                                               |                                                                      
xsdb      100%   86.8    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xsdb1     100%   86.8    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xTotals Read-MB/s=173.6    Writes-MB/s=0.4      Transfers/sec=6448.1    

partition table with one billion rows –> full random select by pk

mysql_test_4

xDiskName Busy  Read WriteMB|0          |25         |50          |75	   100|                                                                        
xsda        0%    0.0    0.0| >                                               |                                                                      
xsda1       0%    0.0    0.0|>                                                |                                                                      
xsda2       0%    0.0    0.0| >                                               |                                                                      
xsda3       0%    0.0    0.0|>                                                |                                                                      
xsda4       0%    0.0    0.0|>disk busy not available                         |                                                                      
xsda5       0%    0.0    0.0|>                                                |                                                                      
xsda6       0%    0.0    0.0| >                                               |                                                                      
xsdb      100%   89.6    0.2|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR>                                                                      
xsdb1     100%   89.6    0.2|                                                 >                                                                      
xTotals Read-MB/s=179.2    Writes-MB/s=0.3      Transfers/sec=6539.3        

no partition table with 100 million rows –> full random select by pk

mysql_test_5

下面基于mongodb的TEST.同样为10亿的表,157G.

[root@db-13 tmp]# mongo
MongoDB shell version: 2.0.8
connecting to: test
> db.foo.totalSize();
157875838416
> db.foo.find().count();
1000000000

——

第一次 使用128G 满额内存 16thread,10亿random query:

[root@db-13 tmp]# mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 16      queries/sec: 126151.69666666667

第二次 使用128G 内存 24 thread,10亿中的前1亿数据random query:

[root@db-13 tmp]# mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24      queries/sec: 166527.42333333334

第三次 使用mysql用户启动mongo 限制mysql用户的mem为24G 24 thread , 10亿中的前1亿数据random query :

[mysql@db-13 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1052672
max locked memory       (kbytes, -l) 26055452
max memory size         (kbytes, -m) 26055452
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[mysql@db-13 tmp]$ mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24	 queries/sec: 161358.03333333333

第四次 使用mysql用户启动mongo 限制mysql用户的mem为24G 24 thread , 10亿random query :

[mysql@db-13 tmp]$ mongo test ./mongodb_benchmark_query.js 
MongoDB shell version: 2.0.8
connecting to: test
threads: 24	 queries/sec: 2549.2 ----------------------> 这里出现了物理IO读写

—提供查询脚本

ops = [{op: "findOne", ns: "test.foo", query: {_id : { "#RAND_INT" : [ 1 , 100000000 ] } }}]       
x=24
 {
    res = benchRun( {
       parallel : x ,
        seconds : 60 ,
        ops : ops
    } );
    print( "threads: " + x + "\t queries/sec: " + res.query );
}

10亿 normal table 对于1亿 normal table 在内存基于PK的访问没有衰减,10亿的partition table 对于 10亿的 normal table 在内存中衰减了2/3,10亿的partition table对于10亿的 normal table 在full table out of memory 的情况下 性能有所提升 (另外注意激活码基本只会被访问1次)

对于mongodb来说,这种业务需求完全可以搞定,在内存充足的情况下QPS达到了16W+/s,但是在内存不足的情况下,暴跌至2549.

SQL to mongoDB 图解

April 8, 2012 mongodb, NoSQL No comments

分享一篇 SQL to mongoDB 图解