How to write a null value from a Spark sql DataFrame expression to a database table? (IllegalArgumentException: cannot get JDBC type for null)

I get the error java.lang.IllegalArgumentException: Can't get JDBC type for null when trying to run the following Example:

  ... val spark = SparkSession.builder .master("local[*]") .appName("Demo") .detOrCreate() import spark.implicits._ //load first table val df_one = spark.read .format("jdbc") .option("url",myDbUrl) .option("dbtable",myTableOne) .option("user",myUser) .option("password",myPassw) .load() df_one.createGlobalTempView("table_one") //load second table val df_two = spark.read .format("jdbc") .option("url",myUrl) .option("dbtable",myTableTwo) .option("user",myUser) .option("password",myPassw) .load() df_two.createGlobalTempView("table_two") //perform join of two tables val df_result = spark.sql( "select o.field_one, t.field_two, null as field_three "+ " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two" ) //Error there: df_result.write .format(jdbc) .option("dbtable",myResultTable) .option("url",myDbUrl) .option("user",myUser) .option("password",myPassw) .mode(SaveMode.Append) .save ... 

I get an error message:

 Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for null at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:147) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:663) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:662) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:662) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) 

A workaround that drastically slows down the workflow:

 ... // create case class for DataSet case class ResultCaseClass(field_one: Option[Int], field_two: Option[Int], field_three: Option[Int]) //perform join of two tables val ds_result = spark.sql( "select o.field_one, t.field_two, null as field_three "+ " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two" ) .withColumn("field_one",$"field_one".cast(IntegerType)) .withColumn("field_two",$"field_two".cast(IntegerType)) .withColumn("field_three",$"field_three".cast(IntegerType)) .as[ResultCaseClass] //Success: ds_result.write...... ... 
+5
source share
1 answer

I came across the same question as yours. Then I found error information from Java source code. If you enter a null value in the database without specifying the data type, you will get "Unable to get JDBC type for zero." The solution to this problem is that you select zero for the data type, which is equal to the database type.

Example:

 lit(null).cast(StringType) or lit(null).cast("string") 
+4
source

Source: https://habr.com/ru/post/1265727/


All Articles