Multi-Range HBase Scan

Question

Multi-Range HBase Scan

I have an HBase table, and I need to get the result from several ranges. For example, I may need to get data from different ranges, for example, lines 1-6, 100-150, ..... I know that for each scan I can determine the start line and stop line. But if I have 6 ranges, I need to scan 6 times. Is there a way to get the result from several ranges from just one scan or from one RPC? My version is HBase 0.98.

+5

hbase

Cheng chen Oct 29 '15 at 20:31

source share

1 answer

Ram ghadiyaram · Answer 1 · 2017-02-02T06:33:57+0000

Filter to support multiple ranges of scan keys. It can create line ranges of lines from which can be accessed by each server in the region.

HBase is quite effective at scanning only one range of keys for a row of lines. If the user needs to specify several ranges of line strings in one scan, typical solutions are:

through FilterList, which is a list of row filters,
using the SQL layer over HBase to join two tables such as bush, phoenix, etc. However, both solutions are ineffective.
Both of them cannot use the range information for fast shipment during scanning, which is rather laborious. If the number of ranges is quite large (for example, millions), combining is the right solution, although it is slow.
However, there are cases where the user wants to specify a small number of ranges for scanning (for example, <1000 ranges). Both that and another. In this case, the solutions can not provide satisfactory performance.

MultiRowRangeFilter must support such usec ase (scan multiple lines of key ranges) that can create key ranges of lines from the user of the specified list and perform fast rewind during scanning. Thus, the scan will be quite effective.

package chengchen; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.Filter; import org.apache.hadoop.hbase.filter.MultiRowRangeFilter; import org.apache.hadoop.hbase.filter.MultiRowRangeFilter.RowKeyRange; import org.apache.hadoop.hbase.util.Bytes; public class MultiRowRangeFilterTest { public static void main(String[] args) throws Exception { if (args.length < 1) { throw new Exception("Table name not specified."); } Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, args[0]); TimeCounter executeTimer = new TimeCounter(); executeTimer.begin(); executeTimer.enter(); Scan scan = new Scan(); List<RowKeyRange> ranges = new ArrayList<RowKeyRange>(); ranges.add(new RowKeyRange(Bytes.toBytes("001"), Bytes.toBytes("002"))); ranges.add(new RowKeyRange(Bytes.toBytes("003"), Bytes.toBytes("004"))); ranges.add(new RowKeyRange(Bytes.toBytes("005"), Bytes.toBytes("006"))); Filter filter = new MultiRowRangeFilter(ranges); scan.setFilter(filter); int count = 0; ResultScanner scanner = table.getScanner(scan); Result r = scanner.next(); while (r != null) { count++; r = scanner.next(); } System.out .println("++ Scanning finished with count : " + count + " ++"); scanner.close(); } }

Please see a test case for implementation in java

Note. However, such SOLR or ES requirements are the best way, in my opinion ... you can check my answer with solr to review the high-level architecture, Im suggesting that since hbase scanning for huge data will be very slow.

Multi-Range HBase Scan

Note. However, such SOLR or ES requirements are the best way, in my opinion ... you can check my answer with solr to review the high-level architecture, Im suggesting that since hbase scanning for huge data will be very slow.

More articles: