Here is my code that connects to the hadoop machine and does a set of checks and writes in another directory.
public class Main{ public static void main(String...strings){ System.setProperty("HADOOP_USER_NAME", "root"); String in1 = "hdfs://myserver/user/root/adnan/inputfile.txt"; String out = "hdfs://myserver/user/root/cascading/temp2"; Properties properties = new Properties(); AppProps.setApplicationJarClass(properties, Main.class); HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties); Tap inTap = new Hfs(new TextDelimited(true, ","), in1); Tap outTap = new Hfs(new TextDelimited(true, ","), out); Pipe inPipe = new Pipe("in1"); Each removeErrors = new Each(inPipe, Fields.ALL, new BigFilter()); GroupBy group = new GroupBy(removeErrors, getGroupByFields(fieldCols)); Every mergeGroup = new Every(group, Fields.ALL, new MergeGroupAggregator(fieldCols), Fields.RESULTS); FlowDef flowDef = FlowDef.flowDef() .addSource(inPipe, inTap) .addTailSink(mergeGroup, outTap); flowConnector.connect(flowDef).complete();
}
My work is transferred to a hadoop machine. I can check it in the workplace. but the job gets unsuccessful and I get an exception below.
cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.mapred.MapTask.getSplitDetails (MapTask.java data) on org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java opin89) on org.apache.hadoop.mapred.MapTask.run (MapTask.javahaps33) at org.apache.hadoop.mapred.Child $ 4.run (Child.java:268) in java.security.AccessController.doPrivileged (native method) in javax.security.auth.Subject.doAs (Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main (Child .java: 262) Called: java.lang.ClassNotFoundException: class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493) at org.apache.hadoop .mapred.MapTask.getSplitDetails (MapTask.javahaps46) ... 7 more
java.lang.ClassNotFoundException: class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493)
Please note that: 1. I am running this from my Windows machine, and hasoop is configured to another box. 2. I use the cloudera distribution for hadoop, which is CDH 4.
source share