Getting the class cascading.tap.hadoop.io.MultiInputSplit was not detected when the hadoop program was run using the cascading structure

Here is my code that connects to the hadoop machine and does a set of checks and writes in another directory.

public class Main{ public static void main(String...strings){ System.setProperty("HADOOP_USER_NAME", "root"); String in1 = "hdfs://myserver/user/root/adnan/inputfile.txt"; String out = "hdfs://myserver/user/root/cascading/temp2"; Properties properties = new Properties(); AppProps.setApplicationJarClass(properties, Main.class); HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties); Tap inTap = new Hfs(new TextDelimited(true, ","), in1); Tap outTap = new Hfs(new TextDelimited(true, ","), out); Pipe inPipe = new Pipe("in1"); Each removeErrors = new Each(inPipe, Fields.ALL, new BigFilter()); GroupBy group = new GroupBy(removeErrors, getGroupByFields(fieldCols)); Every mergeGroup = new Every(group, Fields.ALL, new MergeGroupAggregator(fieldCols), Fields.RESULTS); FlowDef flowDef = FlowDef.flowDef() .addSource(inPipe, inTap) .addTailSink(mergeGroup, outTap); flowConnector.connect(flowDef).complete(); 

}

My work is transferred to a hadoop machine. I can check it in the workplace. but the job gets unsuccessful and I get an exception below.

cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.mapred.MapTask.getSplitDetails (MapTask.java data) on org.apache.hadoop.mapred.MapTask.runOldMapper (MapTask.java opin89) on org.apache.hadoop.mapred.MapTask.run (MapTask.javahaps33) at org.apache.hadoop.mapred.Child $ 4.run (Child.java:268) in java.security.AccessController.doPrivileged (native method) in javax.security.auth.Subject.doAs (Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main (Child .java: 262) Called: java.lang.ClassNotFoundException: class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493) at org.apache.hadoop .mapred.MapTask.getSplitDetails (MapTask.javahaps46) ... 7 more

java.lang.ClassNotFoundException: class cascading.tap.hadoop.io.MultiInputSplit not found at org.apache.hadoop.conf.Configuration.getClassByName (Configuration.java:1493)

Please note that: 1. I am running this from my Windows machine, and hasoop is configured to another box. 2. I use the cloudera distribution for hadoop, which is CDH 4.

0
source share
2 answers

There was a problem. CDH 4.2 has problems with cascade 2.1. So it changed to CDH 4.1, and it worked for me.

0
source

Your properties file is empty, so it may be that your configuration for this job is disabled in the cluster. You must specify the configuration that you use for the HadoopFlowController . The information contained in your Hadoop configuration files found when calling new Configuration belongs to your Properties tag - things like fs.default.name=file://// , etc. I suppose this is even more important when you are cascading a wiring task. "

0
source

Source: https://habr.com/ru/post/1436214/


All Articles