Spark Scala: cannot convert string from string to int since it can truncate

I got this exception while playing with sparks.

An exception in the stream "main" org.apache.spark.sql.AnalysisException: It is not possible to translate pricefrom a string to int because it can truncate the Target type path: - field (class: "scala.Int", name: "price") - root class: "org.spark.code.executable.Main.Record" You can either add an explicit cast to the input data, or select a higher type of field precision in the target;

How can this exception be resolved? Here is the code

object Main {

 case class Record(transactionDate: Timestamp, product: String, price: Int, paymentType: String, name: String, city: String, state: String, country: String,
                accountCreated: Timestamp, lastLogin: Timestamp, latitude: String, longitude: String)
 def main(args: Array[String]) {

   System.setProperty("hadoop.home.dir", "C:\\winutils\\");

   val schema = Encoders.product[Record].schema

   val df = SparkConfig.sparkSession.read
  .option("header", "true")
  .csv("SalesJan2009.csv");

   import SparkConfig.sparkSession.implicits._
   val ds = df.as[Record]

  //ds.groupByKey(body => body.state).count().show()

  import org.apache.spark.sql.expressions.scalalang.typed.{
  count => typedCount,
  sum => typedSum
}

  ds.groupByKey(body => body.state)
  .agg(typedSum[Record](_.price).name("sum(price)"))
  .withColumnRenamed("value", "group")
  .alias("Summary by state")
  .show()
}
+10
source share
1 answer

CSV,

val spark = SparkSession.builder()
  .master("local")
  .appName("test")
  .getOrCreate()

import org.apache.spark.sql.Encoders
val schema = Encoders.product[Record].schema

val ds = spark.read
  .option("header", "true")
  .schema(schema)  // passing schema 
  .option("timestampFormat", "MM/dd/yyyy HH:mm") // passing timestamp format
  .csv(path)// csv path
  .as[Record] // convert to DS

: yyyy-MM-dd'T'HH:mm:ss.SSSXXX yyyy-MM-dd'T'HH:mm:ss.SSSXXX .

+28

Source: https://habr.com/ru/post/1682660/


All Articles