Zeppelin + Spark: reading parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

Using Zeppelin 0.7.2 binaries from the main boot and Spark 2.1.0 w / Hadoop 2.6, the following paragraph:

val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("")

Produces the following:

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
  at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
  at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
  at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
  at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at scala.Option.orElse(Option.scala:289)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
  ... 47 elided

This error does not occur in a normal spark shell, only in Zeppelin. I tried to do the following fixes that do nothing:

  • Download Jackson 2.6.2 to the zeppelin lib folder and restart
  • Add Jackson 2.9 dependencies from maven repositories to interpreter settings
  • Removing jackson mailboxes from zeppelin lib folder

Googling does not create such situations. Please feel free to contact for more information or to make suggestions. Thanks!

+1
2

. com.amazonaws:aws-java-sdk org.apache.hadoop:hadoop-aws Spark. com.fasterxml.jackson.core:* Spark.

com.fasterxml.jackson.core:* , ${ZEPPELIN_HOME}/conf/interpreter.json Spark:

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]

+2

- :

%dep
z.load("com.fasterxml.jackson.core:jackson-core:2.6.2")
-1

Source: https://habr.com/ru/post/1611717/


All Articles