How to unwind an array in a DataFrame (from JSON)?

Each entry in the RDD contains json. I use SQLContext to create a DataFrame from Json as follows:

val signalsJsonRdd = sqlContext.jsonRDD(signalsJson)

Below is a diagram. datapayload is an array of elements. I want to blow up an array of elements to get a data frame, where each row is an element from datapayload. I tried to do something based on this answer, but it seems to me that I need to model the entire structure of the element in the case of Row (arr: Array [...]) . I probably missed something.

val payloadDfs = signalsJsonRdd.explode($"data.datapayload"){ 
    case org.apache.spark.sql.Row(arr: Array[String]) =>  arr.map(Tuple1(_)) 
}

The above code throws a scala.MatchError because the type of the actual Row is very different from Row (arr: Array [String]). There is probably an easy way to do what I want, but I cannot find it. Please, help.

Below is a diagram

signalsJsonRdd.printSchema()

root
 |-- _corrupt_record: string (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- dataid: string (nullable = true)
 |    |-- datapayload: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- Reading: struct (nullable = true)
 |    |    |    |    |-- A2DPActive: boolean (nullable = true)
 |    |    |    |    |-- Accuracy: double (nullable = true)
 |    |    |    |    |-- Active: boolean (nullable = true)
 |    |    |    |    |-- Address: string (nullable = true)
 |    |    |    |    |-- Charging: boolean (nullable = true)
 |    |    |    |    |-- Connected: boolean (nullable = true)
 |    |    |    |    |-- DeviceName: string (nullable = true)
 |    |    |    |    |-- Guid: string (nullable = true)
 |    |    |    |    |-- HandsFree: boolean (nullable = true)
 |    |    |    |    |-- Header: double (nullable = true)
 |    |    |    |    |-- Heading: double (nullable = true)
 |    |    |    |    |-- Latitude: double (nullable = true)
 |    |    |    |    |-- Longitude: double (nullable = true)
 |    |    |    |    |-- PositionSource: long (nullable = true)
 |    |    |    |    |-- Present: boolean (nullable = true)
 |    |    |    |    |-- Radius: double (nullable = true)
 |    |    |    |    |-- SSID: string (nullable = true)
 |    |    |    |    |-- SSIDLength: long (nullable = true)
 |    |    |    |    |-- SpeedInKmh: double (nullable = true)
 |    |    |    |    |-- State: string (nullable = true)
 |    |    |    |    |-- Time: string (nullable = true)
 |    |    |    |    |-- Type: string (nullable = true)
 |    |    |    |-- Time: string (nullable = true)
 |    |    |    |-- Type: string (nullable = true)
+1
1

tl; dr explode - ( flatMap).

explode .

- :

signalsJsonRdd.withColumn("element", explode($"data.datapayload"))

functions .

+2

Source: https://habr.com/ru/post/1016718/


All Articles