I have the following DataFrame in Spark 2.2.0 and Scala 2.11.8.
+----------+-------------------------------+
|item | other_items |
+----------+-------------------------------+
| 111 |[[444,1.0],[333,0.5],[666,0.4]]|
| 222 |[[444,1.0],[333,0.5]] |
| 333 |[] |
| 444 |[[111,2.0],[555,0.5],[777,0.2]]|
I want to get the following DataFrame:
+----------+-------------+
|item | other_items |
+----------+-------------+
| 111 | 444 |
| 222 | 444 |
| 444 | 111 |
So basically, I need to extract the first itemof other_itemsfor each row. In addition, I need to ignore those lines that have an empty array []in other_products.
How can i do this?
I tried this approach, but it does not give the expected result.
result = df.withColumn("other_items",$"other_items"(0))
printScheme outputs the following result:
|-- item: string (nullable = true)
|-- other_items: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: string (nullable = true)
| | |-- _2: double (nullable = true)
source
share