bigdata

Presto-engine Privileges Control – Architect

January 20, 2022 Architect, bigdata, linux, system No comments

We use presto as the main data query engine to provide data query services for business parties. In our architecture, presto as a core capability requires access to data query permission control, including permission control at the table and column levels. We designed The following scheme to achieve this requirement

(more…)

Clickhouse Materialized view

April 24, 2021 Architect, bigdata No comments

We are now using Clickhouse for OLTP  streaming computing system.

Read this PDF for some tips for MV of Clickhouse CK-MV

Logcenter Project architect

August 16, 2017 Architect, Architecture, bigdata, hadoop, hive, network, NoSQL, rdbms No comments

We created a project called LC (log center) for ops department
All member of ops are using this system for analyzing in a lower layer.
We collects all types of log including db-system, crond, secutiry log , cmdlog , api log etc.
We used MQ system for log push which based on a policy center. And we created a new background system to search and management.

Click this project LC-system-design

HBASE migrate table part 1

December 19, 2014 bigdata, hbase No comments

Copy table between different hbase clusters – Version 0.96.2-hadoop2

Create test table and initial some records.

$hbase shell
2014-12-19 15:01:36,085 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> create 'liuyang:mig_hbase', 'test2','member_id','address','info' 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2014-12-19 15:03:00,695 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
0 row(s) in 4.1540 seconds

=> Hbase::Table - liuyang:mig_hbase
hbase(main):002:0> put'liuyang:mig_hbase','test3','info:age','24'
0 row(s) in 0.1990 seconds

hbase(main):003:0> 
hbase(main):004:0* put'liuyang:mig_hbase','test3','info:birthday','1987-06-17'
0 row(s) in 0.0090 seconds

hbase(main):005:0> 
hbase(main):006:0* put'liuyang:mig_hbase','test3','info:company','alibaba'
0 row(s) in 0.0080 seconds

hbase(main):007:0> 
hbase(main):008:0* put'liuyang:mig_hbase','test3','address:contry','china'
0 row(s) in 0.0080 seconds

hbase(main):009:0> 
hbase(main):010:0* put'liuyang:mig_hbase','test3','address:province','liuyang'
0 row(s) in 0.0080 seconds

hbase(main):011:0> 
hbase(main):012:0* put'liuyang:mig_hbase','test3','address:city','hangzhou' 
0 row(s) in 0.0190 seconds

hbase(main):013:0> scan 'liuyang:mig_hbase'
ROW                                              COLUMN+CELL                                                                                                                                   
 test3                                           column=address:city, timestamp=1418972636248, value=hangzhou                                                                                  
 test3                                           column=address:contry, timestamp=1418972634935, value=china                                                                                   
 test3                                           column=address:province, timestamp=1418972635046, value=liuyang                                                                               
 test3                                           column=info:age, timestamp=1418972634471, value=24                                                                                            
 test3                                           column=info:birthday, timestamp=1418972634626, value=1987-06-17                                                                               
 test3                                           column=info:company, timestamp=1418972634764, value=alibaba                                                                                   
1 row(s) in 0.1010 seconds

————————————————————

Copy table to another cluster

use distcp to copy table to destination cluster. If can’t communicate with each other, use -copyToLocal and -copyFromLocal commands

$hadoop distcp -overwrite   /hbase/data/liuyang/mig_hbase  hdfs://10.0.128.110/hbase/data/liuyang/mig_hbase
14/12/19 15:10:18 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/hbase/data/liuyang/mig_hbase], targetPath=hdfs://10.0.128.110/hbase/data/liuyang/mig_hbase}
14/12/19 15:10:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/19 15:10:20 INFO client.RMProxy: Connecting to ResourceManager at vm-master1/10.0.128.32:8032
14/12/19 15:10:23 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
14/12/19 15:10:23 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
14/12/19 15:10:23 INFO client.RMProxy: Connecting to ResourceManager at vm-master1/10.0.128.32:8032
14/12/19 15:10:24 INFO mapreduce.JobSubmitter: number of splits:3
14/12/19 15:10:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1418285943315_0027
14/12/19 15:10:25 INFO impl.YarnClientImpl: Submitted application application_1418285943315_0027
14/12/19 15:10:25 INFO mapreduce.Job: The url to track the job: http://vm-master1:8088/proxy/application_1418285943315_0027/
14/12/19 15:10:25 INFO tools.DistCp: DistCp job-id: job_1418285943315_0027
14/12/19 15:10:25 INFO mapreduce.Job: Running job: job_1418285943315_0027
14/12/19 15:10:37 INFO mapreduce.Job: Job job_1418285943315_0027 running in uber mode : false
14/12/19 15:10:37 INFO mapreduce.Job:  map 0% reduce 0%
14/12/19 15:10:49 INFO mapreduce.Job:  map 100% reduce 0%
14/12/19 15:10:50 INFO mapreduce.Job: Job job_1418285943315_0027 completed successfully
14/12/19 15:10:50 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=281799
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=3828
		HDFS: Number of bytes written=1075
		HDFS: Number of read operations=59
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=17
	Job Counters 
		Launched map tasks=3
		Other local map tasks=3
		Total time spent by all maps in occupied slots (ms)=110112
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=27528
		Total vcore-seconds taken by all map tasks=27528
		Total megabyte-seconds taken by all map tasks=28188672
	Map-Reduce Framework
		Map input records=9
		Map output records=0
		Input split bytes=354
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=201
		CPU time spent (ms)=4870
		Physical memory (bytes) snapshot=496263168
		Virtual memory (bytes) snapshot=2825789440
		Total committed heap usage (bytes)=297795584
	File Input Format Counters 
		Bytes Read=2399
	File Output Format Counters 
		Bytes Written=0
	org.apache.hadoop.tools.mapred.CopyMapper$Counter
		BYTESCOPIED=1075
		BYTESEXPECTED=1075
		COPY=9
	

——————————————————————————————-

old hbase cluster

$hadoop fs -ls hdfs://pajkcluster/hbase/data/liuyang/
14/12/19 15:11:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2014-12-19 15:10 hdfs://pajkcluster/hbase/data/liuyang/mig_hbase
drwxr-xr-x   - hadoop supergroup          0 2014-12-19 11:48 hdfs://pajkcluster/hbase/data/liuyang/test1
drwxr-xr-x   - hadoop supergroup          0 2014-12-19 14:30 hdfs://pajkcluster/hbase/data/liuyang/test2		


run hbck check 

Summary:
  member is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:test1 is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:test2 is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  test_replication is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:mig_hbase is okay.
    Number of regions: 0
    Deployed on: 
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
2 inconsistencies detected.



run hbck -repair

2014-12-19 15:13:00,192 DEBUG [hbasefsck-pool1-t26] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/default/member/ff703bdb6418ca85f5056d9948daf9f7
2014-12-19 15:13:00,195 DEBUG [hbasefsck-pool1-t27] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/hbase/meta/1588230740
2014-12-19 15:13:00,197 DEBUG [hbasefsck-pool1-t17] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/liuyang/mig_hbase/b9b8c6a494dc67e115937736734c4c50
2014-12-19 15:13:00,198 DEBUG [hbasefsck-pool1-t29] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/hbase/acl/6b8e6e2eec2b33c9865a9db27ab8abc2
2014-12-19 15:13:00,201 DEBUG [hbasefsck-pool1-t28] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/default/test_replication/bd129bd09186b37efa83c927b6b8dc84
2014-12-19 15:13:00,205 DEBUG [hbasefsck-pool1-t2] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/liuyang/test2/10100782140c9c01de3086efbf16e6fd
2014-12-19 15:13:00,206 DEBUG [hbasefsck-pool1-t32] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/liuyang/test1/9060543031f308ff2f1c362160d74098
2014-12-19 15:13:00,210 DEBUG [hbasefsck-pool1-t30] util.HBaseFsck: Loading region info from hdfs:hdfs://pajkcluster/hbase/data/hbase/namespace/d6708d93fb70b716e5ee13d323f25eaf
2014-12-19 15:13:00,264 DEBUG [hbasefsck-pool1-t10] util.HBaseFsck: HRegionInfo read: {ENCODED => 10100782140c9c01de3086efbf16e6fd, NAME => 'liuyang:test2,,1418961022564.10100782140c9c01de3086efbf16e6fd.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,266 DEBUG [hbasefsck-pool1-t7] util.HBaseFsck: HRegionInfo read: {ENCODED => d6708d93fb70b716e5ee13d323f25eaf, NAME => 'hbase:namespace,,1418893161615.d6708d93fb70b716e5ee13d323f25eaf.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,267 DEBUG [hbasefsck-pool1-t15] util.HBaseFsck: HRegionInfo read: {ENCODED => 9060543031f308ff2f1c362160d74098, NAME => 'liuyang:test1,,1418960887489.9060543031f308ff2f1c362160d74098.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,272 DEBUG [hbasefsck-pool1-t3] util.HBaseFsck: HRegionInfo read: {ENCODED => b9b8c6a494dc67e115937736734c4c50, NAME => 'liuyang:mig_hbase,,1418972581709.b9b8c6a494dc67e115937736734c4c50.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,273 DEBUG [hbasefsck-pool1-t5] util.HBaseFsck: HRegionInfo read: {ENCODED => 6b8e6e2eec2b33c9865a9db27ab8abc2, NAME => 'hbase:acl,,1418893164546.6b8e6e2eec2b33c9865a9db27ab8abc2.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,277 DEBUG [hbasefsck-pool1-t31] util.HBaseFsck: HRegionInfo read: {ENCODED => ff703bdb6418ca85f5056d9948daf9f7, NAME => 'member,,1418958193008.ff703bdb6418ca85f5056d9948daf9f7.', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,279 DEBUG [hbasefsck-pool1-t33] util.HBaseFsck: HRegionInfo read: {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
2014-12-19 15:13:00,280 DEBUG [hbasefsck-pool1-t6] util.HBaseFsck: HRegionInfo read: {ENCODED => bd129bd09186b37efa83c927b6b8dc84, NAME => 'test_replication,,1418959507510.bd129bd09186b37efa83c927b6b8dc84.', STARTKEY => '', ENDKEY => ''}

.......

Summary:
  member is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:test1 is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:test2 is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  test_replication is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  liuyang:mig_hbase is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  hadoop-vm-master2,60020,1418971284917
0 inconsistencies detected.
Status: OK

run hbase shell to check table status

$hbase shell
2014-12-19 15:28:04,391 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> scan 'liuyang:mig_hbase'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2014-12-19 15:28:14,883 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ROW                                              COLUMN+CELL                                                                                                                                   
 test3                                           column=address:city, timestamp=1418972636248, value=hangzhou                                                                                  
 test3                                           column=address:contry, timestamp=1418972634935, value=china                                                                                   
 test3                                           column=address:province, timestamp=1418972635046, value=liuyang                                                                               
 test3                                           column=info:age, timestamp=1418972634471, value=24                                                                                            
 test3                                           column=info:birthday, timestamp=1418972634626, value=1987-06-17                                                                               
 test3                                           column=info:company, timestamp=1418972634764, value=alibaba                                                                                   
1 row(s) in 0.1020 seconds

HBASE Performance Test by YCSB

September 3, 2014 bigdata, hbase No comments

Read this PDF:Hbase_performance