Why is HBase Java client slow compared to REST / Thrift

Question

Why is HBase Java client slow compared to REST / Thrift

I am doing some performance tests on the HBase Java client / Thrift / REST interface. I have a table called "Airline", which has 500 thousand lines. I am extracting all 500K rows from a table through 4 different Java programs. (using JAVA Client, Thrift, Thrift2 and REST)

Below are the performance numbers with different sample sizes. In this case, the batch size is 100000

[Table which shows the performance numbers. All times are in ms][1]

Perf numbers

I could see that there is a performance improvement as we increase the sample size in the case of REST, Thrift, and Thrift2.

But with the Java API, I see consistent performance, regardless of the sample size. Why doesn't sampling affect the JAVA client?

Here is a snippet of my Java program

Table table = conn.getTable(TableName.valueOf("Airline"));
Scan scan =  new Scan();
ResultScanner scanner = table.getScanner(scan);

for (Result[] result = scanner.next(fetchSize); result.length != 0; result = scanner.next(fetchSize))

{- process strings}

- . / JAVA-.

+4

rest hbase thrift

Vinod Kumar 14 . '17 13:44

1

WattsInABox · Answer 1 · 2017-04-14T15:28:27+0000

, . , ResultScanner, , , Scan.

, , , :

scan.setCaching
scan.setCacheBlocks

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

...

Pig HBaseStorage # initScan

Why is HBase Java client slow compared to REST / Thrift

More articles: