RDD was created in the format Array[Array[String]] and has the following meanings:
Array[Array[String]] = Array(Array(4580056797, 0, 2015-07-29 10:38:42, 0, 1, 1), Array(4580056797, 0, 2015-07-29 10:38:42, 0, 1, 1), Array(4580056797, 0, 2015-07-29 10:38:42, 0, 1, 1), Array(4580057445, 0, 2015-07-29 10:40:37, 0, 1, 1), Array(4580057445, 0, 2015-07-29 10:40:37, 0, 1, 1))
I want to create a DataFrame with a schema:
val schemaString = "callId oCallId callTime duration calltype swId"
Next steps:
scala> val rowRDD = rdd.map(p => Array(p(0), p(1), p(2),p(3),p(4),p(5).trim)) rowRDD: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[14] at map at <console>:39 scala> val calDF = sqlContext.createDataFrame(rowRDD, schema)
Gives the following error:
console: 45: error: overloaded value of createDataFrame method with alternatives: (rdd: org.apache.spark.api.java.JavaRDD [], beanClass: Class []) org.apache.spark.sql.DataFrame (rdd: org. apache.spark.rdd.RDD [], beanClass: Class []) org.apache.spark.sql.DataFrame (rowRDD: org.apache.spark.api.java.JavaRDD [org.apache.spark.sql.Row] , schema: org.apache.spark.sql.types.StructType) org.apache.spark.sql.DataFrame (rowRDD: org.apache.spark.rdd.RDD [org.apache.spark.sql.Row], schema: org.apache.spark.sql.types.StructType) org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.rdd.RDD [Array [String]],
org.apache.spark.sql.types.StructType) val calDF = sqlContext.createDataFrame (rowRDD, schema)