Why is HBase Java client slow compared to REST / Thrift

I am doing some performance tests on the HBase Java client / Thrift / REST interface. I have a table called "Airline", which has 500 thousand lines. I am extracting all 500K rows from a table through 4 different Java programs. (using JAVA Client, Thrift, Thrift2 and REST)

Below are the performance numbers with different sample sizes. In this case, the batch size is 100000


[Table which shows the performance numbers. All times are in ms][1]

Perf numbers


I could see that there is a performance improvement as we increase the sample size in the case of REST, Thrift, and Thrift2.

But with the Java API, I see consistent performance, regardless of the sample size. Why doesn't sampling affect the JAVA client?

Here is a snippet of my Java program


Table table = conn.getTable(TableName.valueOf("Airline"));
Scan scan =  new Scan();
ResultScanner scanner = table.getScanner(scan);

for (Result[] result = scanner.next(fetchSize); result.length != 0; result = scanner.next(fetchSize))

{- process strings}


- . / JAVA-.

+4
1

, . , ResultScanner, , , Scan.

, , , :

scan.setCaching
scan.setCacheBlocks

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

...

Pig HBaseStorage # initScan

+1

Source: https://habr.com/ru/post/1674833/


All Articles