Running a simple cascading application in local mode

I am new to Cascading / Hadoop and trying to run a simple example in local mode (i.e. in memory). The example simply copies the file:

import java.util.Properties; import cascading.flow.Flow; import cascading.flow.FlowConnector; import cascading.flow.FlowDef; import cascading.flow.local.LocalFlowConnector; import cascading.pipe.Pipe; import cascading.property.AppProps; import cascading.scheme.hadoop.TextLine; import cascading.tap.Tap; import cascading.tap.hadoop.Hfs; public class CascadingTest { public static void main(String[] args) { Properties properties = new Properties(); AppProps.setApplicationJarClass( properties, CascadingTest.class ); FlowConnector flowConnector = new LocalFlowConnector(); // create the source tap Tap inTap = new Hfs( new TextLine(), "D:\\git_workspace\\Impatient\\part1\\data\\rain.txt" ); // create the sink tap Tap outTap = new Hfs( new TextLine(), "D:\\git_workspace\\Impatient\\part1\\data\\out.txt" ); // specify a pipe to connect the taps Pipe copyPipe = new Pipe( "copy" ); // connect the taps, pipes, etc., into a flow FlowDef flowDef = FlowDef.flowDef() .addSource( copyPipe, inTap ) .addTailSink( copyPipe, outTap ); // run the flow Flow flow = flowConnector.connect( flowDef ); flow.complete(); } } 

Here is the error I get:

 09-25-12 11:30:38,114 INFO - AppProps - using app.id: 9C82C76AC667FDAA2F6969A0DF3949C6 Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf] at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:515) at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:84) at cascading.flow.FlowConnector.connect(FlowConnector.java:454) at com.xyCascadingTest.main(CascadingTest.java:37) Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf at cascading.tap.hadoop.Hfs.sourceConfInit(Hfs.java:78) at cascading.flow.local.LocalFlowStep.initTaps(LocalFlowStep.java:77) at cascading.flow.local.LocalFlowStep.getInitializedConfig(LocalFlowStep.java:56) at cascading.flow.local.LocalFlowStep.createFlowStepJob(LocalFlowStep.java:135) at cascading.flow.local.LocalFlowStep.createFlowStepJob(LocalFlowStep.java:38) at cascading.flow.planner.BaseFlowStep.getFlowStepJob(BaseFlowStep.java:588) at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1162) at cascading.flow.BaseFlow.initialize(BaseFlow.java:184) at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:78) ... 2 more 
+4
source share
3 answers

For a more detailed description only: you cannot mix local classes and hasoop classes in Cascading, as they assume different and incompatible environments. What happens in your case is that you are trying to create a local thread using haop descriptors, the latter expects that the hasoop JobConf instead of the Properties object is used to configure local branches.

Your code will work if you use cascading.tap.local.FileTap instead of cascading.tap.hadoop.Hfs .

+3
source

Welcome to Cascading -

I just answered the Cascading user list, but in short the problem is a combination of local classes and Hadoop classes. This code has a LocalFlowConnector, but then uses Hfs taps.

When I get back to the classes used in the Impatience tutorial, it works correctly: https://gist.github.com/3784194

+1
source

Yes, you need to use the LFS (Local File System) crane instead of HFS (Hadoop file system).

You can also test your code using Junit test cases (with a cascade-unittest jar) in local mode / from eclipse.

http://www.cascading.org/2012/08/07/cascading-for-the-impatient-part-6/

0
source

Source: https://habr.com/ru/post/1436213/


All Articles