Spark kinesis fails on cloudera with java.lang.AbstractMethodError

below is my POM file. I'm writing sparks with aws kinesis

<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>amazon-kinesis-client</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kinesis-asl_2.10</artifactId> <version>1.6.0</version> </dependency> 

I encountered the exception below while starting a spark spark program on Cloudera 5.10

 17/04/27 05:34:04 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 58.0 (TID 179, hadoop1.local, executor 5): java.lang.AbstractMethodError at org.apache.spark.Logging$class.log(Logging.scala:50) at org.apache.spark.streaming.kinesis.KinesisCheckpointer.log(KinesisCheckpointer.scala:39) at org.apache.spark.Logging$class.logDebug(Logging.scala:62) at org.apache.spark.streaming.kinesis.KinesisCheckpointer.logDebug(KinesisCheckpointer.scala:39) at org.apache.spark.streaming.kinesis.KinesisCheckpointer.startCheckpointerThread(KinesisCheckpointer.scala:119) at org.apache.spark.streaming.kinesis.KinesisCheckpointer.<init>(KinesisCheckpointer.scala:50) at org.apache.spark.streaming.kinesis.KinesisReceiver.onStart(KinesisReceiver.scala:149) at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148) at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130) at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575) at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2000) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 

This works great on EMR4.4. However, CDH does not work. Any suggestion

+5
source share
1 answer

The main problem is using org.apache.spark.Logging:

NOTE. DO NOT use this class outside of Spark. It is intended as an internal utility. This is likely to be changed or removed in future releases.

http://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/Logging.html

This is fixed in 2.0.0, as stated in https://issues.apache.org/jira/browse/SPARK-9307 .

0
source

Source: https://habr.com/ru/post/1267211/


All Articles