I am doing some performance tests on the HBase Java client / Thrift / REST interface. I have a table called "Airline", which has 500 thousand lines. I am extracting all 500K rows from a table through 4 different Java programs. (using JAVA Client, Thrift, Thrift2 and REST)
Below are the performance numbers with different sample sizes. In this case, the batch size is 100000
[Table which shows the performance numbers. All times are in ms][1]
Perf numbers
I could see that there is a performance improvement as we increase the sample size in the case of REST, Thrift, and Thrift2.
But with the Java API, I see consistent performance, regardless of the sample size. Why doesn't sampling affect the JAVA client?
Here is a snippet of my Java program
Table table = conn.getTable(TableName.valueOf("Airline"));
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
for (Result[] result = scanner.next(fetchSize); result.length != 0; result = scanner.next(fetchSize))
{- process strings}
- . / JAVA-.