HBase Mapreduce for Multiple Scan Objects

Question

HBase Mapreduce for Multiple Scan Objects

I'm just trying to evaluate HBase for some of the data analysis materials we make.

HBase will contain event data. The key will be eventId + time. We want to analyze several types of events (4-5) between a date range. The total number of events is about 1000.

The problem with running the mapreduce job in the hbase table is that initTableMapperJob (see below) takes up just 1 scan object. For performance reasons, we want to check data only on 4-5 types of events in the date range, and not on 1000 types of events. If we use the method below, I think we have no such choice, because it accepts only 1 scan object.

public static void initTableMapperJob (row table, Scan scan, Cluster, class outputKeyClass, Class outputValueClass, org.apache.hadoop.mapreduce.Job job) throws an IOException

Can mapreduce be launched in the list of scan objects? any workaround?

thank

+3

hbase mapreduce

Stackunderflow Jan 27 '11 at 20:26

source share

3 answers

:

/Apache/Hadoop/HBase//FilterList.java

. . FilterList , AND OR . .

0

David 28 . '11 17:56

I tried the Dave L approach and it works beautifully.

To configure the map task, you can use the function

  TableMapReduceUtil.initTableMapperJob(byte[] table, Scan scan,
  Class<? extends TableMapper> mapper,
  Class<? extends WritableComparable> outputKeyClass,
  Class<? extends Writable> outputValueClass, Job job,
  boolean addDependencyJars, Class<? extends InputFormat> inputFormatClass)

where inputFormatClass refers to the MultiSegmentTableInputFormat mentioned in the comments of Dave L.

0

Vijay b Apr 1 '13 at 19:59

source share

Dave L. · Accepted Answer · 2011-01-31T22:06:23+0000

TableMapReduceUtil.initTableMapperJobsets up your task for use TableInputFormat, which, as you noticed, takes one Scan.

, . InputFormat, - MultiSegmentTableInputFormat. TableInputFormatBase getSplits, super.getSplits / . ( - TableInputFormatBase.scan.setStartRow() ). InputSplit, .

, MultiSegmentTableInputFormat.

HBase Mapreduce for Multiple Scan Objects

More articles: