I'm just trying to evaluate HBase for some of the data analysis materials we make.
HBase will contain event data. The key will be eventId + time. We want to analyze several types of events (4-5) between a date range. The total number of events is about 1000.
The problem with running the mapreduce job in the hbase table is that initTableMapperJob (see below) takes up just 1 scan object. For performance reasons, we want to check data only on 4-5 types of events in the date range, and not on 1000 types of events. If we use the method below, I think we have no such choice, because it accepts only 1 scan object.
public static void initTableMapperJob (row table, Scan scan, Cluster, class outputKeyClass, Class outputValueClass, org.apache.hadoop.mapreduce.Job job) throws an IOException
Can mapreduce be launched in the list of scan objects? any workaround?
thank
source
share