Spark log class getting not detected when using Spark SQL

I am trying to do simple Spark SQL programming in Java. In the program, I get data from the Cassandra table, converting RDD to Dataset and displaying the data. When I run the spark-submit command, I get an error: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging .

My program:

 SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest") .set("spark.cassandra.connection.host", "abc") .set("spark.cassandra.auth.username", "def") .set("spark.cassandra.auth.password", "ghi"); SparkContext sparkContext = new SparkContext(sparkConf); JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log", mapRowTo(Log.class)); SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate(); Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class); logsDF.show(); 

My POM dependencies:

 <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.0.2</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>2.0.2</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2.11</artifactId> <version>1.6.3</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.0.2</version> </dependency> </dependencies> 

My spark-submit /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar : /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar

How to solve this error? Switching to 1.5.2 does not work, since 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession .

+5
source share
4 answers

Spark Magazine is available for Spark version 1.5.2 and a lower but not higher version. So your dependency in pom.xml should be like this:

 <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.2</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.5.2</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.datastax.spark</groupId> <artifactId>spark-cassandra-connector_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.5.2</version> </dependency> </dependencies> 

Please let me know if this works or not.

0
source

This may be a problem in your IDE. Since some of these packages are created by the Scala Java project, sometimes the IDE cannot understand what is going on. I use Intellij and it continues to show this message to me. But, when I try to run "mvn test" or "mvn package", everything is fine. Check if this is some kind of package error or just a lost IDE.

0
source

The dependency below worked fine for my case.

 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> <version>2.2.0</version> <scope>provided</scope> </dependency> 
0
source

Pretty late party here, but I added

 <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.1</version> <scope>provided</scope> </dependency> 

To solve this problem. It seems I need to work for my business.

0
source

Source: https://habr.com/ru/post/1260871/


All Articles