Java.lang.RuntimeException: java.lang.String is not a valid external type for bigint or int schema

Question

Java.lang.RuntimeException: java.lang.String is not a valid external type for bigint or int schema

I am reading a data frame diagram from a text file. File looks like

id,1,bigint
price,2,bigint
sqft,3,bigint
zip_id,4,int
name,5,string

and I map the parsed data types to Spark Sql data types. The code for creating the data frame is

var schemaSt = new ListBuffer[(String,String)]()
// read schema from file
for (line <- Source.fromFile("meta.txt").getLines()) {
  val word = line.split(",")
  schemaSt += ((word(0),word(2)))
}

// map datatypes
val types = Map("int" -> IntegerType, "bigint" -> LongType)
      .withDefault(_ => StringType)
val schemaChanged = schemaSt.map(x => (x._1,types(x._2))

// read data source
val lines = spark.sparkContext.textFile("data source path")

val fields = schemaChanged.map(x => StructField(x._1, x._2, nullable = true)).toList

val schema = StructType(fields)

val rowRDD = lines
  .map(_.split("\t"))
  .map(attributes => Row.fromSeq(attributes))

// Apply the schema to the RDD
val new_df = spark.createDataFrame(rowRDD, schema)
new_df.show(5)
new_df.printSchema()

but above this only works for StringType. For IntegerType and LongType, this throws exceptions -

java.lang.RuntimeException: java.lang.String is not a valid external type for int schema

and

java.lang.RuntimeException: java.lang.String is not a valid external type for the bigint schema.

Thanks in advance!

+4

scala apache-spark apache-spark-sql spark-dataframe

Naren Feb 01 '17 at 2:04

source share

2 answers

.

+5

ImDarrenG 01 . '17 10:19

Vlad.Bachurin · Answer 1 · 2017-06-26T11:13:22+0000

, Row.fromSeq().

String, Row String. 2- (bigint int).

Row.fromSeq(values: Seq[Any]), values , .

Java.lang.RuntimeException: java.lang.String is not a valid external type for bigint or int schema

More articles: