Getting Spark, Java, and MongoDB to Collaborate

Like my question here , but this time it is Java, not Python, which causes me problems.

I followed the instructions (as far as I know) here , but since I use hadoop-2.6.1 I think I should use the old API, not the new API mentioned in the example.

I am working on Ubuntu and I have various versions of components

  • Spark spark-1.5.1-bin-hadoop2.6
  • Hadoop hadoop-2.6.1
  • Mongo 3.0.8
  • Mongo-Hadoop Connector Enabled Through Maven
  • Java 1.8.0_66
  • Maven 3.0.5

My Java program is basic

import org.apache.spark.api.java.*; import org.apache.spark.SparkConf; import org.apache.spark.api.java.function.Function; import com.mongodb.hadoop.MongoInputFormat; import org.apache.hadoop.conf.Configuration; import org.bson.BSONObject; public class SimpleApp { public static void main(String[] args) { Configuration mongodbConfig = new Configuration(); mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/db.collection"); SparkConf conf = new SparkConf().setAppName("Simple Application"); JavaSparkContext sc = new JavaSparkContext(conf); JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD( mongodbConfig, // Configuration MongoInputFormat.class, // InputFormat: read from a live cluster. Object.class, // Key class BSONObject.class // Value class ); } } 

It builds fine using Maven ( mvn package ) with the following pom file

 <project> <groupId>edu.berkeley</groupId> <artifactId>simple-project</artifactId> <modelVersion>4.0.0</modelVersion> <name>Simple Project</name> <packaging>jar</packaging> <version>1.0</version> <dependencies> <dependency> <!-- Spark dependency --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.mongodb</groupId> <artifactId>mongo-java-driver</artifactId> <version>3.2.0</version> </dependency> <dependency> <groupId>org.mongodb.mongo-hadoop</groupId> <artifactId>mongo-hadoop-core</artifactId> <version>1.4.2</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> </project> 

Then i send the jar

 /usr/local/share/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" --master local[4] target/simple-project-1.0.jar 

and get the following error

 Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat at SimpleApp.main(SimpleApp.java:18) 

NOTIFICATION

I edited this question on December 18 because it is too confusing and verbose. Previous comments may not be relevant. The context of the question, however, is the same.

+5
source share
1 answer

I encountered the same problems, but after many tests and changes, I got my work with this code. I am running a Maven project with netbeans on ubuntu and Java 7 Hope this helps.

Enable maven-shade-plugin if there are conflicts b / w classes

PS: I do not know about your specific error, but have encountered so many. and this code works fine.

  <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.14</version> </dependency> <dependency> <groupId>org.mongodb.mongo-hadoop</groupId> <artifactId>mongo-hadoop-core</artifactId> <version>1.4.1</version> </dependency> </dependencies> 

Java code

  Configuration conf = new Configuration(); conf.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); conf.set("mongo.input.uri", "mongodb://localhost:27017/databasename.collectionname"); SparkConf sconf = new SparkConf().setMaster("local").setAppName("Spark UM Jar"); JavaRDD<User> UserMaster = sc.newAPIHadoopRDD(conf, MongoInputFormat.class, Object.class, BSONObject.class) .map(new Function<Tuple2<Object, BSONObject>, User>() { @Override public User call(Tuple2<Object, BSONObject> v1) throws Exception { //return User } } 
+3
source

Source: https://habr.com/ru/post/1237418/


All Articles