Each entry in the RDD contains json. I use SQLContext to create a DataFrame from Json as follows:
val signalsJsonRdd = sqlContext.jsonRDD(signalsJson)
Below is a diagram. datapayload is an array of elements. I want to blow up an array of elements to get a data frame, where each row is an element from datapayload. I tried to do something based on this answer, but it seems to me that I need to model the entire structure of the element in the case of Row (arr: Array [...]) . I probably missed something.
val payloadDfs = signalsJsonRdd.explode($"data.datapayload"){
case org.apache.spark.sql.Row(arr: Array[String]) => arr.map(Tuple1(_))
}
The above code throws a scala.MatchError because the type of the actual Row is very different from Row (arr: Array [String]). There is probably an easy way to do what I want, but I cannot find it. Please, help.
Below is a diagram
signalsJsonRdd.printSchema()
root
|-- _corrupt_record: string (nullable = true)
|-- data: struct (nullable = true)
| |-- dataid: string (nullable = true)
| |-- datapayload: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- Reading: struct (nullable = true)
| | | | |-- A2DPActive: boolean (nullable = true)
| | | | |-- Accuracy: double (nullable = true)
| | | | |-- Active: boolean (nullable = true)
| | | | |-- Address: string (nullable = true)
| | | | |-- Charging: boolean (nullable = true)
| | | | |-- Connected: boolean (nullable = true)
| | | | |-- DeviceName: string (nullable = true)
| | | | |-- Guid: string (nullable = true)
| | | | |-- HandsFree: boolean (nullable = true)
| | | | |-- Header: double (nullable = true)
| | | | |-- Heading: double (nullable = true)
| | | | |-- Latitude: double (nullable = true)
| | | | |-- Longitude: double (nullable = true)
| | | | |-- PositionSource: long (nullable = true)
| | | | |-- Present: boolean (nullable = true)
| | | | |-- Radius: double (nullable = true)
| | | | |-- SSID: string (nullable = true)
| | | | |-- SSIDLength: long (nullable = true)
| | | | |-- SpeedInKmh: double (nullable = true)
| | | | |-- State: string (nullable = true)
| | | | |-- Time: string (nullable = true)
| | | | |-- Type: string (nullable = true)
| | | |-- Time: string (nullable = true)
| | | |-- Type: string (nullable = true)