My input is in hdf. I'm just trying to make wordcount, but there is a slight difference. The data is in json format. Therefore, each row of data:
{"author":"foo", "text": "hello"} {"author":"foo123", "text": "hello world"} {"author":"foo234", "text": "hello this world"}
I only want to make a word combination in the "text" part.
How to do it?
I tried the following option:
public static class TokenCounterMapper extends Mapper<Object, Text, Text, IntWritable> { private static final Log log = LogFactory.getLog(TokenCounterMapper.class); private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { try { JSONObject jsn = new JSONObject(value.toString());
But I get this error:
Error: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:865) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.Child.main(Child.java:249)
Fraz source share