- Published on
HBase YCSB(Yahoo Cloud Serving Benchmark) 사용법
- Authors
- Name
Overview
HBase에 Data를 밀어넣어, HBase 클러스터의 최대 성능을 측정하고싶을 때가 있다. 또는 HBase에 dummy용 데이터를 넣고싶을 때가 있다. 이 때 사용할 수 있는 YCSB라는 툴이 있어 소개해보려 한다.
설치 방법
maven 설치
YCSB를 source코드로 부터 build하기 위해서는 maven 3점대 이상의 버젼이 필요하다. 따라서 아래의 명령어로 설치한 뒤에 mvn version을 확인한다. 필자의 경우 maven 3.6.3 으로 진행했다.
sudo apt install maven
mvn -version
git clone
YCSB-github에서 프로젝트를 clone한다.
maven build하기
cd YCSB
mvn clean package
python2.7 설치
YCSB는 python2.xx 버전에서 실행되도록 되어있다. 따라서 python2.7을 구동할 수 있는 virtualenv 를 구성해주고, activate 시켜준다.
pip install virtualenv
virtualenv py27 --python=python2.7
source py27/bin/activate
HBase 에 테이블 생성하기
YSCB HBase2 이 문서를 참조하여 usertable이란 테이블을 hbase에서 미리 생성해둔다.
hbase:001:0> n_splits = 50
=> 50
hbase:002:0> create 'usertable', 'family', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
2022-11-19 12:29:26,454 INFO [RPCClient-NioEventLoopGroup-1-1] Configuration.deprecation (Configuration.java:logDeprecation(1394)) - hbase.client.pause.cqtbe is deprecated. Instead, use hbase.client.pause.server.overloaded
Created table usertable2022-11-19 12:29:30,765 INFO [RPCClient-NioEventLoopGroup-1-2] client.AsyncHBaseAdmin (RawAsyncHBaseAdmin.java:onFinished(2569)) - Operation: CREATE, Table Name: default:usertable completed
Took 4.7173 seconds
=> Hbase::Table - usertable
YCSB folder
YCSB가 benchmark를 돌리기 위해서는 hbase-site.xml이 필요하고, 이를 위해서 새로운 폴더를 생성한다. 이후 이 폴더에 hbase 클러스터 중 하나의 서버에 들어가 ${HBASE_HOME}/conf/hbase-site.xml
의 파일을 가져와 이 폴더 하위에 붙여넣는다. scp로 복사해도 상관없다.
/YCSB$ mkdir youngju-hbase
/YCSB$ cd youngju-hbase/
/YCSB/youngju-hbase$ vim hbase-site.xml
root@ubuntu01:~# cat /usr/local/hbase/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu01:9000/hbase</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu01,ubuntu02,ubuntu03</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
</configuration>
run YCSB
YCSB home에서 bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p
명령어를 입력해주면 아래와 같이 1000개의 데이터가 load된다. 반드시 python2.7 가상환경이 activate 되어있어야 python script가 문제없이 수행된다.
(py27) YCSB$ bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN] Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG] Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
YCSB Client 0.18.0-SNAPSHOT
Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 11783
[OVERALL], Throughput(ops/sec), 84.86803021301876
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 10
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.08486803021301877
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 12
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.10184163625562251
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 22
[TOTAL_GC_TIME_%], Time(%), 0.18670966646864126
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 4844.5
[CLEANUP], MinLatency(us), 13
[CLEANUP], MaxLatency(us), 9679
[CLEANUP], 95thPercentileLatency(us), 9679
[CLEANUP], 99thPercentileLatency(us), 9679
[INSERT], Operations, 1000
[INSERT], AverageLatency(us), 9895.29
[INSERT], MinLatency(us), 4144
[INSERT], MaxLatency(us), 1103871
[INSERT], 95thPercentileLatency(us), 14295
[INSERT], 99thPercentileLatency(us), 23871
[INSERT], Return=OK, 1000
이후 load 부분만 run으로 변경하여 bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p
를 실행해주면, 아래와 같이 PUT, READ, UPDATE 등등의 작업을 수행해준다.
(py27) YCSB$ bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN] Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG] Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
YCSB Client 0.18.0-SNAPSHOT
Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 10020
[OVERALL], Throughput(ops/sec), 99.8003992015968
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 14
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.13972055888223553
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 17
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.16966067864271456
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 31
[TOTAL_GC_TIME_%], Time(%), 0.3093812375249501
[READ], Operations, 485
[READ], AverageLatency(us), 8099.581443298969
[READ], MinLatency(us), 2850
[READ], MaxLatency(us), 197247
[READ], 95thPercentileLatency(us), 25519
[READ], 99thPercentileLatency(us), 49439
[READ], Return=OK, 485
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 2581.5
[CLEANUP], MinLatency(us), 9
[CLEANUP], MaxLatency(us), 5155
[CLEANUP], 95thPercentileLatency(us), 5155
[CLEANUP], 99thPercentileLatency(us), 5155
[UPDATE], Operations, 515
[UPDATE], AverageLatency(us), 10536.794174757282
[UPDATE], MinLatency(us), 4048
[UPDATE], MaxLatency(us), 226815
[UPDATE], 95thPercentileLatency(us), 35359
[UPDATE], 99thPercentileLatency(us), 66687
[UPDATE], Return=OK, 515
정리
YCSB는 HBase 1.x, 2.x, 3.x 의 Bench Mark 를 수행할 할 수 있는 유용한 도구이다. 시도는 해보지 않았지만 Redis, dynamodb, elasticsearch, cassandra, mongodb 등 다양한 DB에 대한 성능을 측정할 수 있는 기능이 있다. 실제 상황에서는 어떤 데이터가 어떤 패턴으로 들어올지는 아무도 모르기 때문에, YCSB의 결과에만 의존해서는 안된다. 특히 HBase는 hotspot이 발생할 수 있기 때문에, 특정리전서버에만 요청이 과도하게 몰릴경우 전체 QPS가 낮아지게 된다. HBase의 자세한 아키텍쳐 및 row-key 설계방식은 조대협님의 HBase와 구글의 빅테이블 아키텍쳐 포스팅을 보면 자세히 나와있으니 참조하면 좋다.